QUERY · ISSUE

tests: extmod/time_time_ns.py intermittent failure due to CI runner clock precision

openby andrewleechopened 2026-02-25updated 2026-02-25

The time_time_ns.py test makes assertions about time.time_ns() precision
that intermittently fail on shared CI runners. Observed in the float (1)
and longlong (1) jobs in a 20-run log window. Attributed to ~7 of 103
failed runs over 14 months.

On shared CI runners the wall clock can have insufficient precision or
the process can be descheduled between measurements, causing timing
assertions to return False instead of True.

This one might be addressable by increasing the tolerance in the test
rather than fixing underlying code.

PR #18861 now ignores this failure in CI.

See analysis: https://gist.github.com/andrewleech/5686ed5242e0948d8679c432579e002e

CANDIDATE · PULL REQUEST

tests/extmod: Make test time_res.py more deterministic.

openby andrewleechopened 2025-10-30updated 2025-11-03

tests

Summary

This PR improves the determinism of tests/extmod/time_res.py which has been failing intermittently on Windows CI.

Problem: The test was counting unique values returned by time functions over a 2.5-second window, expecting at least 3 unique values for second-resolution functions (gmtime(), localtime()). This approach had fundamental issues:

Clock source mismatch: Used ticks_ms() (system tick timer) to measure test duration while sampling gmtime()/localtime() (RTC). On embedded platforms these are different hardware clocks that can drift relative to each other.
Race conditions: To see 3 unique second values in a 2.5-second window, the RTC must advance by >2.0 seconds. Due to timing overhead and clock drift, the test could observe only 2.4 seconds of RTC time, causing spurious failures.
Windows-specific issues: Windows system clock has ~15ms granularity, making the sample-counting approach particularly unreliable.

Recent CI failures:

https://github.com/micropython/micropython/actions/runs/18826145552/job/53709126186
https://github.com/micropython/micropython/actions/runs/18861478955/job/53820524197

Attempted Solution: Replace with direct resolution and bounds testing:

For each time function, measure value before sleep, sleep appropriate duration, measure after
Verify the function advanced within expected bounds (80%-200% of sleep time)
Lower bound checks proper resolution, upper bound catches broken implementations
2x upper tolerance handles loaded CI systems while catching real problems
Eliminates clock drift issues by using appropriate sleep duration for each clock source

Testing

The test logic here is hopefully more robust:

Second-resolution functions (time(), gmtime(), localtime()): Sleep 1200ms and verify value changed within 1-2.4 second range
Tick functions (ticks_ms, ticks_us, ticks_ns): Sleep 150ms and verify advanced within 80%-200% of expected
Upper bound checking: 2x tolerance handles loaded CI systems while catching broken implementations
Platform handling: Gracefully handles platforms where ticks_cpu returns 0

This approach aims to tests the actual contract of each function (proper time resolution within bounds) rather than a proxy metric (sample counts in specified window).

The test should hopefully now pass reliably on all platforms including Windows, Unix, and embedded targets without platform-specific skip lists.

tests: extmod/time_time_ns.py intermittent failure due to CI runner clock precision

tests/extmod: Make test time_res.py more deterministic.

Summary

Testing

Keyboard