stress_schedule consistently fails with native emitter
Port, board and/or hardware
unix port, standard variant, x86_64, debian trixie
MicroPython version
MicroPython v1.27.0-preview.388.g27544a2d81 on 2025-11-13; linux [GCC 14.2.0] version
Reproduction
$ make -C ports/unix
$ ports/unix/build-standard/micropython -X emit=native tests/thread/stress_schedule.py
Expected behaviour
Expected to print "PASS"
Observed behaviour
After about 10 seconds prints "4", meaning that only 4 scheduled tasks ran during the test. This happens consistently.
Additional Information
It works when "-X emit=native" is omitted.
On the "good" side of the bisect it sometimes fails but prints a large number like 9158, meaning it almost succeeded. The "bad" side always prints 4.
I don't understand why it's not seen during CI; perhaps debian trixie has some detail of threading different than ubuntu 24.04, or perhaps my computer's configuration is different in a relevant way (12 thread Intel i5-1235U)
I attempted to diagnose this further and bisected it to (ping @projectgus)
52a593cdb14ed732b5580bbed39c0325815adedf is the first bad commit
commit 52a593cdb14ed732b5580bbed39c0325815adedf
Author: Angus Gratton <angus@redyak.com.au>
Date: Wed Sep 4 17:17:38 2024 +1000
py/scheduler: Only run callbacks on the main thread if GIL is disabled.
Otherwise it's very difficult to reason about thread safety in a
scheduler callback, as it can run at any time on any thread - including
racing against any bytecode operation on any thread.
This work was funded through GitHub Sponsors.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
docs/library/micropython.rst | 8 ++++++++
py/scheduler.c | 8 +++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
<details>
<summary>git bisect log</summary>
(micropython) jepler@bert:~/src/micropython/ports/unix$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [8cd15829e293f01dae91b6a2d4a995bfaeca887b] all: Bump version to 1.22.2.
git bisect good 8cd15829e293f01dae91b6a2d4a995bfaeca887b
# status: waiting for bad commit, 1 good commit known
# bad: [b15348415e9d5ad2a978ca38a8da356faee88e91] extmod/modframebuf: Add support for blit'ing read-only data.
git bisect bad b15348415e9d5ad2a978ca38a8da356faee88e91
# good: [9feb0689eeaca5ce88aedcc680f997a3b4d0221c] all: Bump version to 1.22.0.
git bisect good 9feb0689eeaca5ce88aedcc680f997a3b4d0221c
# bad: [e5eeaa7df894b5062c189f5d8da44c67550cf43a] docs/reference/mpremote: Update docs to mention new features.
git bisect bad e5eeaa7df894b5062c189f5d8da44c67550cf43a
# good: [03cf4d4980af604ebae6928f51a311b53d98e2c8] rp2/boards/W5500_EVB_PICO: Update incorrect url in board.json.
git bisect good 03cf4d4980af604ebae6928f51a311b53d98e2c8
# good: [052693e4495354daabb4d61e97121283334c6665] esp32/boards: Reduce IRAM usage.
git bisect good 052693e4495354daabb4d61e97121283334c6665
# good: [3294606e2319dd64226434f20acc15b8869ddf55] extmod/libmetal: Fix libmetal rules for mkdir dependencies.
git bisect good 3294606e2319dd64226434f20acc15b8869ddf55
# bad: [eec5eb4260ebc76694f751f0966f4c8236da9609] stm32/uart: Add UART RX/CTS pin pull config options.
git bisect bad eec5eb4260ebc76694f751f0966f4c8236da9609
# good: [ded8bbdd5efc50d2b86e3020462ced8bbcf4d25c] rp2/machine_pin_cyw43: Include check for CYW43_GPIO.
git bisect good ded8bbdd5efc50d2b86e3020462ced8bbcf4d25c
# bad: [d775db72b9bf01d2f40f6cf4d4d98941f30b7bd2] esp32/boards/UM_FEATHERS3NEO: Add FeatherS3 Neo board definition.
git bisect bad d775db72b9bf01d2f40f6cf4d4d98941f30b7bd2
# bad: [7b5738ad86f11eef682c5a649754777d2935a156] tools/ci.sh: Clean up the Unix port's MIPS target.
git bisect bad 7b5738ad86f11eef682c5a649754777d2935a156
# bad: [5d8878b582b8b68d19ab02adfe32d683d5ea512f] shared/tinyusb: Only run TinyUSB on the main thread if GIL is disabled.
git bisect bad 5d8878b582b8b68d19ab02adfe32d683d5ea512f
# bad: [52a593cdb14ed732b5580bbed39c0325815adedf] py/scheduler: Only run callbacks on the main thread if GIL is disabled.
git bisect bad 52a593cdb14ed732b5580bbed39c0325815adedf
# good: [451ba1cf386a2a0874ea20ea593dd6a009ede011] rp2/modules: Fix FatFS boot script to detect invalid FAT filesystem.
git bisect good 451ba1cf386a2a0874ea20ea593dd6a009ede011
# first bad commit: [52a593cdb14ed732b5580bbed39c0325815adedf] py/scheduler: Only run callbacks on the main thread if GIL is disabled.
</summary>
Code of Conduct
Yes, I agree
py/scheduler,rp2: Avoid scheduler race conditions when GIL disabled, fix "TinyUSB callback can't recurse" error
Summary
Looks like there was a bug since v1.22 when using rp2 threads, where either CPU may poll CDC input and trigger the TinyUSB task. TinyUSB could run on both CPUs concurrently, which may have lead to some incorrect behaviour.
The race started triggering an exception when runtime USB support was added, and a check was added for the USB task recursing on itself from a Python handler function. This check can be incorrectly triggered by the race (as the flag that indicates the TinyUSB task is running is set by the other CPU).
The race is most commonly triggered when working from the interactive REPL, even a minimal running thread can trigger it. This commit adds a test case that triggers it in a different way (polling stdin from a thread).
The root cause is that the scheduler on threaded ports without GIL can run on any thread, and creates potential for race conditions that are hard to avoid. So the first fix is to run all scheduler callbacks on the main thread if GIL disabled.
The secondary fix is not to run the TinyUSB task when polled from another task, if the GIL is disabled. Instead we schedule the TinyUSB task to run on the main thread.
Closes #15390.
Trade-offs and Alternatives
Could make the scheduler always only run on the main thread, but with the GIL this has quite an impact on scheduler latency - and isn't necessary for correct behaviour. There is a branch linked from a comment below that mostly works around this, but it's a more complex change (and will still have higher latency than running the callback immediately in whatever thread is currently holding the GIL).
Testing
- Ran all thread tests, including the new test case, on rp2 port with and without
MICROPY_HW_ENABLE_USB_RUNTIME_DEVICEset, verified no longer raises an exception. - Ran all thread tests on esp32 port, verified no regression when GIL enabled.
This work was funded through GitHub Sponsors.