QUERY · ISSUE

ESP32: Threading with 2 or 3 similar threads not properly scheduled

openby rkompassopened 2022-11-26updated 2023-01-06

bugport-esp32

Platform: ESP32 Wroom
MP-Version: MicroPython v1.19.1 on 2022-06-18; ESP32 module with ESP32

I observed that the computation performance of 2 or 3 identical threads started in parallel is very unevenly distributed, the worst case being one of 3 threads getting only 1 % of computation power.
Further investigation revealed that there are very long durations during which a single thread is not activated. This may be the reason for the first observation.
A test program appended below after a few runs exhibits the following output,

Threads tmin:         0         0 ms
Threads tmax:      9981        10 ms
Iterations:         957    302461

showing that the first of 2 threads is not active in 9.98 seconds of 10 seconds total, resulting in only 957 iterations instead of about 150000 which would be expected in fair scheduling.
This problem seems to vanish if there are 4 or more similar threads active, then the tmax take quite similar values of 20-40 ms.
This is only observed on esp32. Testing of Pyboard and Unix port of MP revealed fair scheduling.

The test program:

import _thread
from time import sleep, sleep_ms
from machine import idle, Timer

flag = False
tim0 = Timer(0)
tim0.init(period=20000, mode=Timer.ONE_SHOT, callback=lambda t:print('interrupt 0'))
rmin = [0, 0]
rmax = [0, 0]
ret = [0, 0]

def th_tims(i):
    global rmin, rmax, ret
    while not flag:
        idle()
    tmin = 1_000_000; tmax = 0; n = 0
    t0 = tim0.value()
    t1 = tim0.value()
    td = t1 - t0        #   put some computation here, 
    t1 = min(td, tmin)  # .. so we don't produce the minimum here
    t1 = max(td, tmin)
    t1 = min(td, tmin)
    t1 = max(td, tmin)
    while flag:
        t1 = tim0.value()    # get weird results without the lock (4, 5, 10 seconds for max) 
        td = t1 - t0
        tmin = min(td, tmin)
        tmax = max(td, tmax)
        t0 = t1
        n +=1
    rmin[i] = tmin
    rmax[i] = tmax
    ret[i] = n

_thread.start_new_thread(th_tims, (0,))
_thread.start_new_thread(th_tims, (1,))

flag = True
sleep(10)
flag = False
sleep_ms(50)
print('Threads tmin: {:9d} {:9d} ms'.format(rmin[0], rmin[1]))
print('Threads tmax: {:9d} {:9d} ms'.format(rmax[0], rmax[1]))
print('Iterations:   {:9d} {:9d}'.format(ret[0], ret[1]))

From an initial version with machine.ticks_us() I switched to hardware timer, suspecting that ticks_us() was not thread-safe.
This is wrong. It was discussed here.

As a partial workaround the introduction of a lock (aquired and released in the while flag: loop) could be used. The threads then have breaks of up to 0.5 seconds and overall performance is almost halved, but scheduling is quite fair.

CANDIDATE · ISSUE

esp32: threading: stack check seems seems wrong

closedby c0d3z3r0opened 2019-06-10updated 2019-06-11

port-esp32

Hi there,

my application always raised an recursion exceed exception so I decided to do some test.
First some basic information:

esp32 (esp-wroom-32)
stack lenght increased to 50 (50*1024) for testing (same issues with the default of 16)
master branch

This is my test code, a simple recursive function:

def test(n):
    print(n)
    test(n+1)

Test 1: STACK_CHECK=0, no threading

>>> test(0)
0
...
157
1***ERROR*** A stack overflow in task mp_task has been detected.
abort() was called at PC 0x4008d5bc on core 1

Backtrace: 0x4008d2a7:0x3ffbd700 0x4008d5a5:0x3ffbd720 0x4008d5bc:0x3ffbd740 0x4008f3ea:0x3ffbd760 0x4008e558:0x3ffbd790 0x4008e50e:0x00000000

Rebooting...

Test 2: STACK_CHECK=1, no threading

>>> test(0)
0
...
153
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
...
  File "<stdin>", line 2, in test
RuntimeError: maximum recursion depth exceeded

Test 3: STACK_CHECK=0, threaded

>>> _thread.start_new_thread(test, (0,))
0
...
54
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x4011431c  PS      : 0x00060330  A0      : 0x80118905  A1      : 0x3ffbb9d0  
A2      : 0x00000029  A3      : 0x3ffd1e28  A4      : 0x3ffb5188  A5      : 0x00000000  
A6      : 0x0000000a  A7      : 0x00000001  A8      : 0x00000000  A9      : 0x3ffbb9b0  
A10     : 0x3ffbaaf8  A11     : 0x3ffbb9b0  A12     : 0x3ffb0fac  A13     : 0x3ffbb9a0  
A14     : 0x3ffb0fac  A15     : 0x2a0a4fd8  SAR     : 0x00000019  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000033  LBEG    : 0x40116045  LEND    : 0x4011604e  LCOUNT  : 0x00000000  

Backtrace: 0x4011431c:0x3ffbb9d0 0x40118902:0x3ffbb9f0 0x401189e1:0x3ffbba10 0x4010e0df:0x3ffbba40 0x4008e63d:0x3ffbba70

Rebooting...

Test 4: STACK_CHECK=1, threaded

>>> _thread.start_new_thread(test, (0,))
0
...
10
Unhandled exception in thread started by <function test at 0x3ffe4eb0>
Traceback (most recent call last):
  File "<stdin>", line 3, in test
...
  File "<stdin>", line 2, in test
RuntimeError: maximum recursion depth exceeded

Well, the first two tests look ok. There's not much different in maximum stack size.

Where does this big difference in stack size (~150 vs. ~50) between threaded and unthreaded calls come from?

Test 4 looks like a bug, doesn't it? The maximum stack size should be about ~50.

ESP32: Threading with 2 or 3 similar threads not properly scheduled

esp32: threading: stack check seems seems wrong

Keyboard