QUERY · ISSUE

tests: thread stress tests intermittent failures under QEMU (stress_aes, stress_recurse, stress_schedule)

openby andrewleechopened 2026-02-25updated 2026-03-19

tests

Three thread stress tests fail intermittently under QEMU emulation on CI:

thread/stress_aes.py — times out on QEMU ARM/MIPS/RISCV64. Execution
time approaches or exceeds the configured timeout (70-180s depending on
arch). Observed 7 times in a 20-run log window. Attributed to ~28 of 103
failed runs over 14 months. On RISCV64 it's excluded entirely because it
takes ~180s against a 200s timeout.

thread/stress_recurse.py — was already excluded from qemu_mips,
qemu_arm, qemu_riscv64 with "is flaky" comments. No direct log
observations in the sample window since it was excluded, but the
exclusion predates the analysis period.

thread/stress_schedule.py — crashed once (expected PASS, got
CRASH) on qemu_riscv64 in the 20-run window. Low frequency but a
crash rather than a timeout suggests a real issue.

These may be QEMU-specific timing/emulation issues rather than bugs in
MicroPython's threading, but the crash in stress_schedule suggests at
least some of these are real.

PR #18861 now ignores these failures in CI. stress_aes.py is additionally
excluded on RISCV64 to avoid burning ~180s of CI time on each timeout.

See analysis: https://gist.github.com/andrewleech/5686ed5242e0948d8679c432579e002e

CANDIDATE · PULL REQUEST

tests/run-tests.py: Add automatic retry for known-flaky tests.

mergedby andrewleechopened 2026-02-23updated 2026-03-11

tests

Summary

The unix port CI workflow on master fails ~18% of the time due to a handful of intermittent test failures (mostly thread-related). This has normalised red CI to the point where contributors ignore it, masking real regressions.

I added a flaky-test allowlist and retry-until-pass mechanism to run-tests.py. When a listed test fails, it's re-run up to 2 more times and passes on the first success. The 8 tests on the list account for essentially all spurious CI failures on master.

tools/ci.sh is updated to remove --exclude patterns for tests now handled by the retry logic, so they get actual coverage again. thread/stress_aes.py stays excluded on RISC-V QEMU where it runs close to the 200s timeout and retries would burn excessive CI time.

Retries are on by default; --no-retry disables them. The end-of-run summary reports tests that passed via retry separately from clean passes.

Testing

Full CI passed on fork (all workflows green). The codecov upload step is also gated on CODECOV_TOKEN being available so it skips cleanly on forks.

Trade-offs and Alternatives

Retry-until-pass can mask a test that has become intermittently broken. The allowlist is deliberately small and each entry documents a reason and optional platform restriction. An alternative would be to mark these tests as expected-fail, but that removes coverage entirely which is worse than the current situation.

The test result status strings ("pass", "fail", "skip") are bare string literals throughout run-tests.py — the retry code follows this existing convention. Worth considering a follow-up to consolidate these into constants for the whole file.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.