← index #18868PR #17993
Related · high · value 0.454
QUERY · ISSUE

tests: cmdline/repl_lock.py and repl_cont.py intermittent failures

openby andrewleechopened 2026-02-25updated 2026-03-19
tests

Two REPL tests fail intermittently on CI:

cmdline/repl_lock.py — fails on QEMU ARM and RISCV64. The expected
output shows >>> micropython.heap_lock() but the actual output drops
the >>> prompt prefix. Observed 3 times in 20 runs with logs. This is
a REPL prompt timing issue under QEMU emulation.

cmdline/repl_cont.py — fails on macOS. Differences in quote escaping
in REPL continuation prompts ("'" vs '\''). Observed once in 20 runs.
The macOS job has historically been the second most failure-prone job
(4.3% failure rate, 25 failures over 14 months) with all failures
attributed to REPL-related issues. The August 2025 spike (11 macOS
failures) correlates with the GitHub Actions macOS 15 runner migration.

PR #18861 now ignores these failures in CI.

See analysis: https://gist.github.com/andrewleech/5686ed5242e0948d8679c432579e002e

CANDIDATE · PULL REQUEST

tests/thread/thread_gc1: Skip unreliable test in Github CI.

mergedby AJMansfieldopened 2025-08-27updated 2025-09-12
tests

Summary

thread/thread_gc1.py is a constant source of spurious failures in Github CI.

This PR adds it to the list of tests skipped when running on Github CI using either macos, qemu_riscv64, qemu_mips, or qemu_arm, to help reduce the overall false positive rate and improve the predictive value of the test fail indication.

Testing

I examined a sample of the last 25 unix port Github Actions runs, tabulated their outcomes and the causes attributable to any failures, and examined relevant statistics over the results to

Action Run Outcome Failed Job(s) Cause
17256297609 FAIL macos thread/thread_gc1.py
17248166934 PASS
17239364181 FAIL macos thread/thread_gc1.py
17239346523 PASS
17232449204 FAIL macos thread/thread_gc1.py (possibly valid)
17230929159 FAIL qemu_arm cmdline/repl_sys_ps1_ps2.py
17230082929 PASS
17226283109 FAIL settrace_stackless<br>macos<br>qemu_mips thread/thread_gc1.py<br>thread/thread_gc1.py<br>thread/thread_gc1.py
17226266333 PASS
17225202917 FAIL macos<br>qemu_riscv64 thread/thread_gc1.py<br>thread/thread_gc1.py
17224743621 FAIL settrace_stackless<br>macos thread/thread_gc1.py<br>thread/thread_gc1.py
17224739270 FAIL macos thread/thread_gc1.py
17220251949 FAIL macos thread/thread_gc1.py
17218037418 PASS
17218024199 PASS
17212060390 FAIL macos thread/thread_gc1.py
17211892105 CANCEL
17209911695 FAIL macos thread/thread_gc1.py
17209904205 FAIL macos thread/thread_gc1.py
17196446007 PASS
17196132542 FAIL macos thread/thread_gc1.py
17180766768 PASS
17175320257 PASS
17175019154 FAIL macos<br>qemu_mips thread/thread_gc1.py<br>thread/thread_gc1.py
17175013008 CANCEL

Of the 14 test failures observed in this sample, all but one were attributable to thread/thread_gc1.py, with all but one of these failures happening on macos or qemu. (Note that one of these changes did touch thread code, so for the sake of robustness I've assumed it's actually a true positive in my analysis.)

This test has a false positive rate of 59% over this sample, an F1 score of 0.13, and a positive predictive value of 7.14% (i.e. when the test suite reports failure for a PR, the chance that the failure is due to the PR's change is only 7%, due to this test.)

This test should therefore be disabled in these scenarios where it's unreliable.

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied