Pyboard SysTick losing ticks: interrupts sometimes locked out for 170ms
I've spent a couple of days chasing a timing bug and I think I've nailed the cause: the SysTick handler appears to sporadically miss out a chunk of time (about 170ms). I checked this by modifying the SysTick handler to monitor itself:
uint32_t tick_timestamp;
uint32_t last_tick_timestamp;
uint32_t tick_overruns = 0;
uint32_t last_overrun;
/**
* @brief This function handles SysTick Handler.
* @param None
* @retval None
*/
void SysTick_Handler(void) {
// Instead of calling HAL_IncTick we do the increment here of the counter.
// This is purely for efficiency, since SysTick is called 1000 times per
// second at the highest interrupt priority.
extern volatile uint32_t uwTick;
last_tick_timestamp = tick_timestamp;
tick_timestamp = DWT->CYCCNT;
if((tick_timestamp - last_tick_timestamp) > 336000U) { /* 2ms late */
last_overrun = tick_timestamp - last_tick_timestamp;
tick_overruns++;
}
uwTick += 1;
Basically it uses the debug CPU clock counter to see how long since it last ran. Obviously this is normally around 168000 clocks (i.e. 1ms), sometimes a bit less, sometimes a bit more (because of ISR jitter). But every now and then it misses a huge chunk of time. Here's the output of one of my test programs:
ERROR: elapsed_ms and ms_delta mismatch, margin=169
overruns=11, overrun by=168001(28862777)
There are a bunch of overruns at boot then everything is fine for a while. Then (in the above example) overrun 11 hit. The key number above is 28862777: when divided by / 168000 it is 171.8ms. Basically the SysTick handler itself sees that it hasn't run for a big chunk of time. Which can only happen if something is locking out all interrupts for a massive amount of time or SysTick has been disabled. I'm pretty sure there's nothing disabling it.
I wondered what might be locking out interrupts for such a huge chunk of time so I wrote a minimal test program:
from pyb import delay, rng
from time import ticks_cpu
def timestamps():
start = ticks_cpu()
print("Starting with {}".format(start))
while True:
delay_time = rng() % 200
print("Delay time = {}ms".format(delay_time))
before = ticks_cpu() & 0x3fffffff
delay(delay_time)
after = ticks_cpu() & 0x3fffffff
realtime = (after - before) & 0x3fffffff
realtime_ms = realtime / 168000
if abs(realtime_ms - delay_time) > 1:
print("ERROR: delay of {}ms actually took {}ms ({:08x},{:08x} = {})".format(delay_time, realtime_ms, after, before, after - before))
return
I apply masks above to use only the bottom 30 bits to get modulo arithmetic working right for calculating delays.
I had to modify the ticks_cpu() function to return a longer range than just a small int (the clock runs so fast 16 bits enough):
STATIC mp_obj_t time_ticks_cpu(void) {
static bool enabled = false;
if (!enabled) {
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
DWT->CYCCNT = 0;
DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
enabled = true;
}
return mp_obj_new_int_from_uint(DWT->CYCCNT);
}
STATIC MP_DEFINE_CONST_FUN_OBJ_0(time_ticks_cpu_obj, time_ticks_cpu);
Here's an example run:
Delay time = 22ms
Delay time = 105ms
Delay time = 199ms
Delay time = 170ms
ERROR: delay of 170ms actually took 340.4643ms (1b10cd27,17a80774 = 57198003)
This affects all timing that is based on SysTick. So delay(), millis(), etc. all go wrong.
I investigated further into what is happening and I know what at least one of the culprits is: the flash file system mounted on USB. If you run this test program then copy something to the file system it trips the bug. But it also trips at random (which may be the host OS - Ubuntu in my case - touching the file system for indexing or something).
stm32: mp_hal_ticks_us returns non-monotonic values.
Port, board and/or hardware
stm32, STM32N6.
MicroPython version
MicroPython 1.26.0
Reproduction
diff --git a/ports/stm32/main.c b/ports/stm32/main.c
index af4d7f8bb..7583148ee 100644
--- a/ports/stm32/main.c
+++ b/ports/stm32/main.c
@@ -549,6 +549,17 @@ void stm32_main(uint32_t reset_mode) {
MICROPY_BOARD_BEFORE_SOFT_RESET_LOOP(&state);
+ while (true) {
+ mp_uint_t last_tick = mp_hal_ticks_us();
+ SCB->ICSR = SCB_ICSR_PENDSVSET_Msk;
+
+ if (mp_hal_ticks_us() < last_tick) {
+ MICROPY_BOARD_FATAL_ERROR("");
+ }
+
+ SCB->ICSR = SCB_ICSR_PENDSVSET_Msk;
+ }
+
soft_reset:
MICROPY_BOARD_TOP_SOFT_RESET_LOOP(&state);
Expected behaviour
Time is a river that never bends back.
Observed behaviour
mp_hal_ticks_us values can go backward by up to ~1 ms.
Additional Information
mp_hal_ticks_us values can go backward by up to ~1 ms, especially on faster MCUs (like the N6) and when IRQs occur around the calls. You’re probably aware of this possibility, judging by the comments in that function, so what I’m looking for is advice on whether this is something we should try to fix or just accept. Right now I'm using this workaround:
static inline mp_uint_t mp_hal_ticks_us_monotonic(void) {
static mp_uint_t last_timestamp = 0;
mp_uint_t current = mp_hal_ticks_us();
// Ensure monotonic behavior
if (current > last_timestamp) {
last_timestamp = current;
}
return last_timestamp;
}
However, I'm using mp_hal_ticks_us as part of a code profiler, so accuracy is very important. The following function seems to fix the issue, and it might also be better because it doesn't disable IRQs. However, I might be missing something, or this loop may not be wanted, that's why I didn't send this as a fix:
mp_uint_t mp_hal_ticks_us(void) {
uint32_t ms1, ms2, counter, load, ctrl;
// Repeat until we get a consistent millisecond + VAL snapshot
do {
ms1 = HAL_GetTick();
counter = SysTick->VAL;
ctrl = SysTick->CTRL; // Read CTRL to check COUNTFLAG
ms2 = HAL_GetTick();
} while (ms1 != ms2);
load = SysTick->LOAD;
// Check if SysTick rolled over but HAL_GetTick() hasn't been updated yet
// This happens when IRQs are disabled or the interrupt is pending
if (ctrl & SysTick_CTRL_COUNTFLAG_Msk) {
// SysTick has rolled over since last interrupt
// The counter value is from the new cycle, so increment milliseconds
ms1++;
}
// Convert from decrementing to incrementing
counter = load - counter;
return ms1 * 1000 + (counter * 1000) / (load + 1);
}
Code of Conduct
Yes, I agree