ESP32: Threading with 2 or 3 similar threads not properly scheduled
Platform: ESP32 Wroom
MP-Version: MicroPython v1.19.1 on 2022-06-18; ESP32 module with ESP32
I observed that the computation performance of 2 or 3 identical threads started in parallel is very unevenly distributed, the worst case being one of 3 threads getting only 1 % of computation power.
Further investigation revealed that there are very long durations during which a single thread is not activated. This may be the reason for the first observation.
A test program appended below after a few runs exhibits the following output,
Threads tmin: 0 0 ms
Threads tmax: 9981 10 ms
Iterations: 957 302461
showing that the first of 2 threads is not active in 9.98 seconds of 10 seconds total, resulting in only 957 iterations instead of about 150000 which would be expected in fair scheduling.
This problem seems to vanish if there are 4 or more similar threads active, then the tmax take quite similar values of 20-40 ms.
This is only observed on esp32. Testing of Pyboard and Unix port of MP revealed fair scheduling.
The test program:
import _thread
from time import sleep, sleep_ms
from machine import idle, Timer
flag = False
tim0 = Timer(0)
tim0.init(period=20000, mode=Timer.ONE_SHOT, callback=lambda t:print('interrupt 0'))
rmin = [0, 0]
rmax = [0, 0]
ret = [0, 0]
def th_tims(i):
global rmin, rmax, ret
while not flag:
idle()
tmin = 1_000_000; tmax = 0; n = 0
t0 = tim0.value()
t1 = tim0.value()
td = t1 - t0 # put some computation here,
t1 = min(td, tmin) # .. so we don't produce the minimum here
t1 = max(td, tmin)
t1 = min(td, tmin)
t1 = max(td, tmin)
while flag:
t1 = tim0.value() # get weird results without the lock (4, 5, 10 seconds for max)
td = t1 - t0
tmin = min(td, tmin)
tmax = max(td, tmax)
t0 = t1
n +=1
rmin[i] = tmin
rmax[i] = tmax
ret[i] = n
_thread.start_new_thread(th_tims, (0,))
_thread.start_new_thread(th_tims, (1,))
flag = True
sleep(10)
flag = False
sleep_ms(50)
print('Threads tmin: {:9d} {:9d} ms'.format(rmin[0], rmin[1]))
print('Threads tmax: {:9d} {:9d} ms'.format(rmax[0], rmax[1]))
print('Iterations: {:9d} {:9d}'.format(ret[0], ret[1]))
From an initial version with machine.ticks_us() I switched to hardware timer, suspecting that ticks_us() was not thread-safe.
This is wrong. It was discussed here.
As a partial workaround the introduction of a lock (aquired and released in the while flag: loop) could be used. The threads then have breaks of up to 0.5 seconds and overall performance is almost halved, but scheduling is quite fair.
esp32: threading: stack check seems seems wrong
Hi there,
my application always raised an recursion exceed exception so I decided to do some test.
First some basic information:
- esp32 (esp-wroom-32)
- stack lenght increased to 50 (50*1024) for testing (same issues with the default of 16)
- master branch
This is my test code, a simple recursive function:
def test(n):
print(n)
test(n+1)
Test 1: STACK_CHECK=0, no threading
>>> test(0)
0
...
157
1***ERROR*** A stack overflow in task mp_task has been detected.
abort() was called at PC 0x4008d5bc on core 1
Backtrace: 0x4008d2a7:0x3ffbd700 0x4008d5a5:0x3ffbd720 0x4008d5bc:0x3ffbd740 0x4008f3ea:0x3ffbd760 0x4008e558:0x3ffbd790 0x4008e50e:0x00000000
Rebooting...
Test 2: STACK_CHECK=1, no threading
>>> test(0)
0
...
153
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
File "<stdin>", line 2, in test
RuntimeError: maximum recursion depth exceeded
Test 3: STACK_CHECK=0, threaded
>>> _thread.start_new_thread(test, (0,))
0
...
54
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x4011431c PS : 0x00060330 A0 : 0x80118905 A1 : 0x3ffbb9d0
A2 : 0x00000029 A3 : 0x3ffd1e28 A4 : 0x3ffb5188 A5 : 0x00000000
A6 : 0x0000000a A7 : 0x00000001 A8 : 0x00000000 A9 : 0x3ffbb9b0
A10 : 0x3ffbaaf8 A11 : 0x3ffbb9b0 A12 : 0x3ffb0fac A13 : 0x3ffbb9a0
A14 : 0x3ffb0fac A15 : 0x2a0a4fd8 SAR : 0x00000019 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000033 LBEG : 0x40116045 LEND : 0x4011604e LCOUNT : 0x00000000
Backtrace: 0x4011431c:0x3ffbb9d0 0x40118902:0x3ffbb9f0 0x401189e1:0x3ffbba10 0x4010e0df:0x3ffbba40 0x4008e63d:0x3ffbba70
Rebooting...
Test 4: STACK_CHECK=1, threaded
>>> _thread.start_new_thread(test, (0,))
0
...
10
Unhandled exception in thread started by <function test at 0x3ffe4eb0>
Traceback (most recent call last):
File "<stdin>", line 3, in test
...
File "<stdin>", line 2, in test
RuntimeError: maximum recursion depth exceeded
Well, the first two tests look ok. There's not much different in maximum stack size.
Where does this big difference in stack size (~150 vs. ~50) between threaded and unthreaded calls come from?
Test 4 looks like a bug, doesn't it? The maximum stack size should be about ~50.