Set threshold for Coveralls failure
Since MicroPython coverage tests have non-deterministic behavior, many lines of code are executed a different number of times in each CI run. Coveralls takes this number into account when calculating the coverage, so often we get "failures" due to variance between runs even though nothing actually changed. Coveralls provides a threshold setting for what constitutes failure. This could be set (or increased if it is already set) to reduce the number of nuisance "failures".

tests: thread/thread_gc1.py intermittent failure on CI
The thread_gc1.py test fails intermittently on CI with False instead of
True. This is the single biggest contributor to CI flakiness on master,
attributed to ~62 of 103 failed runs over 14 months (575 runs sampled).
Observed in settrace_stackless (6 times), coverage (3 times) in a 20-run
window with available logs. The test was already excluded from macos,
qemu_mips, qemu_arm, and qemu_riscv64 jobs prior to PR #18861.
The test spawns threads that perform garbage collection and checks a
boolean result. The failure pattern suggests a race condition in the GC
or thread interaction, not a test logic issue — the test is correctly
detecting a real bug.
Estimated per-execution failure rate: ~1.3% across the 8 CI jobs that
run it.
PR #18861 now ignores this failure in CI so it doesn't block other work,
but the underlying issue should be fixed.
See analysis: https://gist.github.com/andrewleech/5686ed5242e0948d8679c432579e002e