QUERY · ISSUE

Set threshold for Coveralls failure

closedby dlechopened 2021-06-24updated 2026-05-01

proposed-close

Since MicroPython coverage tests have non-deterministic behavior, many lines of code are executed a different number of times in each CI run. Coveralls takes this number into account when calculating the coverage, so often we get "failures" due to variance between runs even though nothing actually changed. Coveralls provides a threshold setting for what constitutes failure. This could be set (or increased if it is already set) to reduce the number of nuisance "failures".

CANDIDATE · ISSUE

tests: thread/thread_gc1.py intermittent failure on CI

openby andrewleechopened 2026-02-25updated 2026-03-19

tests

The thread_gc1.py test fails intermittently on CI with False instead of
True. This is the single biggest contributor to CI flakiness on master,
attributed to ~62 of 103 failed runs over 14 months (575 runs sampled).

Observed in settrace_stackless (6 times), coverage (3 times) in a 20-run
window with available logs. The test was already excluded from macos,
qemu_mips, qemu_arm, and qemu_riscv64 jobs prior to PR #18861.

The test spawns threads that perform garbage collection and checks a
boolean result. The failure pattern suggests a race condition in the GC
or thread interaction, not a test logic issue — the test is correctly
detecting a real bug.

Estimated per-execution failure rate: ~1.3% across the 8 CI jobs that
run it.

PR #18861 now ignores this failure in CI so it doesn't block other work,
but the underlying issue should be fixed.

See analysis: https://gist.github.com/andrewleech/5686ed5242e0948d8679c432579e002e

Set threshold for Coveralls failure

tests: thread/thread_gc1.py intermittent failure on CI

Keyboard