tests: extmod/time_time_ns.py intermittent failure due to CI runner clock precision
The time_time_ns.py test makes assertions about time.time_ns() precision
that intermittently fail on shared CI runners. Observed in the float (1)
and longlong (1) jobs in a 20-run log window. Attributed to ~7 of 103
failed runs over 14 months.
On shared CI runners the wall clock can have insufficient precision or
the process can be descheduled between measurements, causing timing
assertions to return False instead of True.
This one might be addressable by increasing the tolerance in the test
rather than fixing underlying code.
PR #18861 now ignores this failure in CI.
See analysis: https://gist.github.com/andrewleech/5686ed5242e0948d8679c432579e002e
tests/extmod: Make test time_res.py more deterministic.
Summary
This PR improves the determinism of tests/extmod/time_res.py which has been failing intermittently on Windows CI.
Problem: The test was counting unique values returned by time functions over a 2.5-second window, expecting at least 3 unique values for second-resolution functions (gmtime(), localtime()). This approach had fundamental issues:
-
Clock source mismatch: Used
ticks_ms()(system tick timer) to measure test duration while samplinggmtime()/localtime()(RTC). On embedded platforms these are different hardware clocks that can drift relative to each other. -
Race conditions: To see 3 unique second values in a 2.5-second window, the RTC must advance by >2.0 seconds. Due to timing overhead and clock drift, the test could observe only 2.4 seconds of RTC time, causing spurious failures.
-
Windows-specific issues: Windows system clock has ~15ms granularity, making the sample-counting approach particularly unreliable.
Recent CI failures:
- https://github.com/micropython/micropython/actions/runs/18826145552/job/53709126186
- https://github.com/micropython/micropython/actions/runs/18861478955/job/53820524197
Attempted Solution: Replace with direct resolution and bounds testing:
- For each time function, measure value before sleep, sleep appropriate duration, measure after
- Verify the function advanced within expected bounds (80%-200% of sleep time)
- Lower bound checks proper resolution, upper bound catches broken implementations
- 2x upper tolerance handles loaded CI systems while catching real problems
- Eliminates clock drift issues by using appropriate sleep duration for each clock source
Testing
The test logic here is hopefully more robust:
- Second-resolution functions (
time(),gmtime(),localtime()): Sleep 1200ms and verify value changed within 1-2.4 second range - Tick functions (
ticks_ms,ticks_us,ticks_ns): Sleep 150ms and verify advanced within 80%-200% of expected - Upper bound checking: 2x tolerance handles loaded CI systems while catching broken implementations
- Platform handling: Gracefully handles platforms where
ticks_cpureturns 0
This approach aims to tests the actual contract of each function (proper time resolution within bounds) rather than a proxy metric (sample counts in specified window).
The test should hopefully now pass reliably on all platforms including Windows, Unix, and embedded targets without platform-specific skip lists.