QUERY · ISSUE

Set threshold for Coveralls failure

closedby dlechopened 2021-06-24updated 2026-05-01

proposed-close

Since MicroPython coverage tests have non-deterministic behavior, many lines of code are executed a different number of times in each CI run. Coveralls takes this number into account when calculating the coverage, so often we get "failures" due to variance between runs even though nothing actually changed. Coveralls provides a threshold setting for what constitutes failure. This could be set (or increased if it is already set) to reduce the number of nuisance "failures".

CANDIDATE · PULL REQUEST

tests/run-tests.py: Add automatic retry for known-flaky tests.

mergedby andrewleechopened 2026-02-23updated 2026-03-11

tests

Summary

The unix port CI workflow on master fails ~18% of the time due to a handful of intermittent test failures (mostly thread-related). This has normalised red CI to the point where contributors ignore it, masking real regressions.

I added a flaky-test allowlist and retry-until-pass mechanism to run-tests.py. When a listed test fails, it's re-run up to 2 more times and passes on the first success. The 8 tests on the list account for essentially all spurious CI failures on master.

tools/ci.sh is updated to remove --exclude patterns for tests now handled by the retry logic, so they get actual coverage again. thread/stress_aes.py stays excluded on RISC-V QEMU where it runs close to the 200s timeout and retries would burn excessive CI time.

Retries are on by default; --no-retry disables them. The end-of-run summary reports tests that passed via retry separately from clean passes.

Testing

Full CI passed on fork (all workflows green). The codecov upload step is also gated on CODECOV_TOKEN being available so it skips cleanly on forks.

Trade-offs and Alternatives

Retry-until-pass can mask a test that has become intermittently broken. The allowlist is deliberately small and each entry documents a reason and optional platform restriction. An alternative would be to mark these tests as expected-fail, but that removes coverage entirely which is worse than the current situation.

The test result status strings ("pass", "fail", "skip") are bare string literals throughout run-tests.py — the retry code follows this existing convention. Worth considering a follow-up to consolidate these into constants for the whole file.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.