← index #17875PR #17363
Related · high · value 0.513
QUERY · ISSUE

wiznet5k driver can call scheduler from hard IRQ context

openby tpwrulesopened 2025-08-09updated 2025-08-10
bug

Port, board and/or hardware

W5500-EVB-Pico

MicroPython version

MicroPython v1.25.0 on 2025-04-15; W5500-EVB-Pico with RP2040 (official downloaded .uf2)

Reproduction

  1. Put code below into a file
  2. Disconnect Ethernet from board and hard reset it and ensure it doesn't have any files that run on boot
  3. Run the code with mpremote run
import network
from machine import Timer
import micropython
micropython.alloc_emergency_exception_buf(100)

print("starting...")
l = [[] for _ in range(5000)] # make GC take longer

nic = network.WIZNET5K()
nic.active(True)

def timer_cb(x):
    data = bytearray(1024) # stress GC from scheduler
timer = Timer(freq=100, callback=timer_cb) # soft IRQ only on rp2

print("running...")
while True:
    data = bytearray(1024) # stress GC from main loop

Expected behaviour

The code runs indefinitely.

Observed behaviour

After several seconds, the following exception appears:

Traceback (most recent call last):
  File "<stdin>", line 13, in timer_cb
MemoryError: memory allocation failed, heap is locked

Sometimes conditions are wrong and it will not appear and a fresh reset is needed.

Additional Information

Also reproduces on 1.26.0.

It appears that the WIZNET5K Ethernet driver will call the MicroPython scheduler through here if blocking sends or receives are performed, but those can occur from PendSV timer callbacks, which can run while the GC is locked. Not sure of an appropriate resolution.

Backtrace from SWD (issue reproduces on a standard build but the custom build is what I have symbols for):
<details>

Breakpoint 1, m_malloc_fail (num_bytes=num_bytes@entry=16) at /build/source/py/runtime.c:1656
1656	        mp_raise_msg(&mp_type_MemoryError, MP_ERROR_TEXT("memory allocation failed, heap is locked"));
(gdb) bt
#0  m_malloc_fail (num_bytes=num_bytes@entry=16) at /build/source/py/runtime.c:1656
#1  0x1000e3cc in m_malloc (num_bytes=16) at /build/source/py/malloc.c:88
#2  m_malloc (num_bytes=num_bytes@entry=16) at /build/source/py/malloc.c:85
#3  0x10013240 in array_new (typecode=typecode@entry=1 '\001', n=n@entry=1024) at /build/source/py/objarray.c:99
#4  0x10013790 in bytearray_make_new (type_in=<optimized out>, n_args=1, n_kw=<optimized out>, args=<optimized out>) at /build/source/py/objarray.c:191
#5  0x1001b2fc in type_call (self_in=<optimized out>, n_args=<optimized out>, n_kw=<optimized out>, args=<optimized out>) at /build/source/py/objtype.c:1017
#6  0x1001ded4 in mp_call_function_n_kw (fun_in=0x10070c9c <mp_type_bytearray>, n_args=1, n_kw=0, args=args@entry=0x20041ac8) at /build/source/py/runtime.c:726
#7  0x10020120 in mp_execute_bytecode (code_state=code_state@entry=0x20041ab0, inject_exc=inject_exc@entry=0x0) at /build/source/py/vm.c:957
#8  0x10015784 in fun_bc_call (self_in=0x2000c2a0, n_args=1, n_kw=0, args=0x20041b2c) at /build/source/py/objfun.c:295
#9  0x1001ded4 in mp_call_function_n_kw (fun_in=0x2000c2a0, n_args=1, n_kw=0, args=0x20041b2c) at /build/source/py/runtime.c:726
#10 0x1001df6a in mp_call_function_1 (fun=fun@entry=0x2000c2a0, arg=<optimized out>, arg@entry=0x2003c190) at /build/source/py/runtime.c:704
#11 0x1001ed6e in mp_call_function_1_protected (fun=fun@entry=0x2000c2a0, arg=arg@entry=0x2003c190) at /build/source/py/runtime_utils.c:33
#12 0x1001ef5a in mp_sched_run_pending () at /build/source/py/scheduler.c:111
#13 mp_handle_pending (raise_exc=raise_exc@entry=true) at /build/source/py/scheduler.c:246
#14 0x1002b314 in mpy_wiznet_yield () at /build/source/extmod/network_wiznet5k.c:146
#15 0x100351fe in wizchip_sendto (sn=sn@entry=0 '\000', buf=buf@entry=0x20003425 <wiznet5k_obj+29> "\377\377\377\377\377\377\002\300\v\214c\232\b", len=len@entry=350, 
    addr=addr@entry=0x20041bcc "\001\001\001\001\b4", port=port@entry=11) at /build/source/lib/wiznet5k/Ethernet/socket.c:619
#16 0x1002ae18 in wiznet5k_send_ethernet (self=self@entry=0x20003408 <wiznet5k_obj>, len=350, buf=buf@entry=0x20003425 <wiznet5k_obj+29> "\377\377\377\377\377\377\002\300\v\214c\232\b")
    at /build/source/extmod/network_wiznet5k.c:255
#17 0x1002b044 in wiznet5k_netif_output (netif=<optimized out>, p=0x20008858 <ram_heap+345>) at /build/source/extmod/network_wiznet5k.c:292
#18 0x10048fd0 in ethernet_output (netif=netif@entry=0x20003a14 <wiznet5k_obj+1548>, p=p@entry=0x20008858 <ram_heap+345>, src=src@entry=0x20003afe <wiznet5k_obj+1782>, dst=0x1007e556 <ethbroadcast>, 
    eth_type=eth_type@entry=2048) at /build/source/lib/lwip/src/netif/ethernet.c:320
#19 0x1003dd2c in etharp_output (netif=0x20003a14 <wiznet5k_obj+1548>, q=0x20008858 <ram_heap+345>, ipaddr=0x1007e3bc <ip_addr_broadcast>) at /build/source/lib/lwip/src/core/ipv4/etharp.c:897
#20 0x1003e99a in ip4_output_if_opt_src (p=p@entry=0x20008858 <ram_heap+345>, src=src@entry=0x1007e3d4 <ip_addr_any>, dest=dest@entry=0x1007e3bc <ip_addr_broadcast>, ttl=255 '\377', 
    tos=tos@entry=0 '\000', proto=proto@entry=17 '\021', netif=netif@entry=0x20003a14 <wiznet5k_obj+1548>, ip_options=ip_options@entry=0x0, optlen=optlen@entry=0)
    at /build/source/lib/lwip/src/core/ipv4/ip4.c:1041
#21 0x1003ea20 in ip4_output_if_src (p=p@entry=0x20008858 <ram_heap+345>, src=src@entry=0x1007e3d4 <ip_addr_any>, dest=dest@entry=0x1007e3bc <ip_addr_broadcast>, ttl=<optimized out>, tos=0 '\000', 
    proto=proto@entry=17 '\021', netif=netif@entry=0x20003a14 <wiznet5k_obj+1548>) at /build/source/lib/lwip/src/core/ipv4/ip4.c:866
#22 0x10048af6 in udp_sendto_if_src (pcb=0x20008638 <memp_memory_UDP_PCB_base+243>, p=p@entry=0x20008858 <ram_heap+345>, dst_ip=0x1007e3bc <ip_addr_broadcast>, dst_port=dst_port@entry=67, 
    netif=netif@entry=0x20003a14 <wiznet5k_obj+1548>, src_ip=0x1007e3d4 <ip_addr_any>) at /build/source/lib/lwip/src/core/udp.c:893
#23 0x1003ccc0 in dhcp_discover (netif=0x20003a14 <wiznet5k_obj+1548>) at /build/source/lib/lwip/src/core/ipv4/dhcp.c:1059
#24 0x1003d776 in dhcp_timeout (netif=0x20003a14 <wiznet5k_obj+1548>) at /build/source/lib/lwip/src/core/ipv4/dhcp.c:602
#25 dhcp_fine_tmr () at /build/source/lib/lwip/src/core/ipv4/dhcp.c:564
#26 0x100482f0 in lwip_cyclic_timer (arg=0x1007e518 <lwip_cyclic_timers+32>) at /build/source/lib/lwip/src/core/timeouts.c:241
#27 0x10048440 in sys_check_timeouts () at /build/source/lib/lwip/src/core/timeouts.c:390
#28 0x100370bc in mp_network_soft_timer_callback (self=<optimized out>) at /build/source/ports/rp2/mpnetworkport.c:135
#29 0x10033f3c in soft_timer_handler () at /build/source/shared/runtime/softtimer.c:93
#30 0x1003736a in PendSV_Handler () at /build/source/ports/rp2/pendsv.c:177
#31 <signal handler called>
#32 0x1000d2aa in gc_deal_with_stack_overflow () at /build/source/py/gc.c:558
#33 gc_collect_end () at /build/source/py/gc.c:536
#34 0x10036dba in gc_collect () at /build/source/ports/rp2/main.c:288
#35 0x1000d50c in gc_alloc (n_bytes=n_bytes@entry=1024, alloc_flags=alloc_flags@entry=0) at /build/source/py/gc.c:840
#36 0x1000e3be in m_malloc (num_bytes=num_bytes@entry=1024) at /build/source/py/malloc.c:86
#37 0x1001325e in array_new (typecode=typecode@entry=1 '\001', n=n@entry=1024) at /build/source/py/objarray.c:110
#38 0x10013790 in bytearray_make_new (type_in=<optimized out>, n_args=1, n_kw=<optimized out>, args=<optimized out>) at /build/source/py/objarray.c:191
#39 0x1001b2fc in type_call (self_in=<optimized out>, n_args=<optimized out>, n_kw=<optimized out>, args=<optimized out>) at /build/source/py/objtype.c:1017
#40 0x1001ded4 in mp_call_function_n_kw (fun_in=0x10070c9c <mp_type_bytearray>, n_args=1, n_kw=0, args=args@entry=0x20041ec0) at /build/source/py/runtime.c:726
#41 0x10020120 in mp_execute_bytecode (code_state=code_state@entry=0x20041ea8, inject_exc=inject_exc@entry=0x0) at /build/source/py/vm.c:957
#42 0x10015784 in fun_bc_call (self_in=0x2000ccd0, n_args=0, n_kw=0, args=0x0) at /build/source/py/objfun.c:295
#43 0x1001ded4 in mp_call_function_n_kw (fun_in=0x2000ccd0, n_args=0, n_kw=0, args=0x0) at /build/source/py/runtime.c:726
#44 0x1001deec in mp_call_function_0 (fun=fun@entry=0x2000ccd0) at /build/source/py/runtime.c:700
#45 0x10033aa2 in parse_compile_execute (source=source@entry=0x20041f94, input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, exec_flags=exec_flags@entry=65) at /build/source/shared/runtime/pyexec.c:119
#46 0x10033c02 in do_reader_stdin (c=65) at /build/source/shared/runtime/pyexec.c:276
#47 pyexec_raw_repl () at /build/source/shared/runtime/pyexec.c:509
#48 0x10036d58 in main (argc=<optimized out>, argv=<optimized out>) at /build/source/ports/rp2/main.c:232

</details>

Code of Conduct

Yes, I agree

CANDIDATE · PULL REQUEST

rp2/machine_timer: Support hard IRQ timer callbacks.

mergedby arachsysopened 2025-05-26updated 2025-06-16
port-rp2

Summary

When using Timer on the pyboard (and stm32 more generally), the callback is run in hard IRQ context and is reliably punctual.

On a Pi Pico 2 board, I was surprised to find that the rp2 port instead queues timer callbacks via mp_sched_schedule() which means they are subject to GC delays and potentially unbounded jitter.

This probably makes sense as an 'easy mode' default, allowing allocation and less disciplined code in a timer callback, but having it as the only option makes tight timing more awkward on rp2 than it is on stm32. Searching discussions, some have worked around the unpunctual timers by repurposing a PWM output with a hard trigger IRQ. A PIO state machine in an irq() loop with a hard IRQ handler would also do the job, but in each case we're wasting limited resources and adding complexity.

Rather than hacking around the problem once again, it feels sensible to add a (non-default) option to make the rp2 timers dispatch callbacks in hard IRQ context in the same way stm32 timers do.

Add a hard= parameter to the Timer() constructor and init, matching the hard= parameter on Pin.irq() and rp2.DMA.irq() and rp2.PIO.irq(). When set true, this dispatches the callback in hard IRQ context in the same way as the other hard IRQ handlers for rp2 and the timer callbacks for stm32, taking the same precautions to lock the scheduler and GC, catch unhandled exceptions, and disable the callback on unhandled exceptions.

Does this make sense? I've tested this reasonably heavily in my own application without uncovering any problems (see below for a demo) but would be grateful for any thoughts. For example, should this patch make the behaviour configurable in the other ports at the same time? Obviously, I'll update the patch to update the documentation to match wherever it ends up as well.

Testing

To demonstrate the GC jitter with the default soft callbacks on a Pi Pico or Pico 2, try:

from machine import Pin, Timer

pin = machine.Pin(0, machine.Pin.OUT)
timer = Timer(freq = 1000, mode = Timer.PERIODIC, callback = lambda t: pin.toggle())

x = 0.3
while True:
    x = 4 * x * (1 - x)

with a scope on pin GP0. The fp loop allocates and gcs heavily, so you'll see the 500Hz square wave often lengthen out from 1 ms to 2-4 ms when the callback is delayed.

With this patch and the new hard callbacks, this effect disappears and the jitter is down in the single digit microseconds:

from machine import Pin, Timer

pin = machine.Pin(0, machine.Pin.OUT)
timer = Timer(freq = 1000, mode = Timer.PERIODIC, callback = lambda t: pin.toggle(), hard = True)

x = 0.3
while True:
    x = 4 * x * (1 - x)

I've tested with multiple concurrent timers of disjoint frequencies up to 20kHz (pushing towards 100% utilisation in interrupt handling) on a variety of different RP2350 boards.

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied