Set threshold for Coveralls failure
Since MicroPython coverage tests have non-deterministic behavior, many lines of code are executed a different number of times in each CI run. Coveralls takes this number into account when calculating the coverage, so often we get "failures" due to variance between runs even though nothing actually changed. Coveralls provides a threshold setting for what constitutes failure. This could be set (or increased if it is already set) to reduce the number of nuisance "failures".

codecov does not run reports if any test fails
As documented in https://docs.codecov.com/docs/ci-service-relationship#section-checking-ci-status, codecov doesn't run reports if any CI test fails. However, it is quite common for the code size test to fail when adding new features and this is when code coverage reports are arguably the most useful.
I suggest that we change the code size test to never fail and make it always add a pull request comment with the report instead.