← index #18559Issue #17815
Related · medium · value 1.730
QUERY · ISSUE

Crash/segfault in async code

openby smurfixopened 2025-12-12updated 2025-12-31
bug

Port, board and/or hardware

Unix

MicroPython version

MicroPython v1.27.0-18.g3bca4bdbde

The commit is the TaskGroup branch https://github.com/micropython/micropython/pull/8791 (possibly outdated, I need to do some cleanup, but nothing material).

Reproduction

Unfortunately I have not yet found a small(ish)-enough reproducer.

Steps to re-create the problem:

  • apt install git build-essential python3-dev libffi-dev pkg-config # or equivalent
  • git clone -b bughunt --depth 15 https://github.com/M-o-a-T/moat.git
  • git submodule update ---init --recursive
  • pip install -r requirements.txt # or equivalent
  • (cd ext/micropython/ports/unix; make DEBUG=1 COPT=-O0 CSUPEROPT=-O0 ) # CSUPEROPT is required to debug vm.c
  • pytest -sx tests/moat_micro/test_micro.py::test_iter_m

The last step starts micropython and emits a command line for GDB. MicroPython segfaults shortly after you continue.

Expected behaviour

The test works, or at least it won't crash MicroPython.

Observed behaviour

Program received signal SIGSEGV, Segmentation fault.
0x0000558b239997e0 in mp_obj_exception_add_traceback (self_in=0x7f669081dc80, file=1994,
    line=357, block=138) at ../../py/objexcept.c:636
636             self->traceback_len += TRACEBACK_ENTRY_LEN;
(gdb) whe
#0  0x000055bb494e8960 in mp_obj_exception_add_traceback (self_in=0x7fdd5d05a840, file=1994,
    line=357, block=138) at ../../py/objexcept.c:636
#1  0x000055bb4957cb5d in mp_execute_bytecode (code_state=0x7fdd5d05a770,
    inject_exc=<optimized out>) at ../../py/vm.c:1461
#2  0x000055bb494ee461 in mp_obj_gen_resume (self_in=0x7fdd5d05a760, send_value=0x6,
    throw_value=0x0, ret_val=0x7ffd379fbe38) at ../../py/objgenerator.c:210
#3  0x000055bb494adce1 in mp_resume (self_in=0x7fdd5d05a760, send_value=0x6, throw_value=0x0,
    ret_val=0x7ffd379fbe38) at ../../py/runtime.c:1424
#4  0x000055bb4957e447 in mp_execute_bytecode (code_state=0x7fdd5d04af50,
    inject_exc=<optimized out>) at ../../py/vm.c:1230
#5  0x000055bb494ee461 in mp_obj_gen_resume (self_in=0x7fdd5d04af40, send_value=0x6,
    throw_value=0x0, ret_val=0x7ffd379fbfe8) at ../../py/objgenerator.c:210
#6  0x000055bb494adce1 in mp_resume (self_in=0x7fdd5d04af40, send_value=0x6, throw_value=0x0,
    ret_val=0x7ffd379fbfe8) at ../../py/runtime.c:1424
...
(gdb) inf locals
self = 0x55bb496bd140 <native_base_init_wrapper_obj>
tb_data = 0x55bb4954d6d9 <native_base_init_wrapper>
(gdb) p *self
$5 = {base = {type = 0x55bb496bb9b0 <mp_type_fun_builtin_var>}, traceback_alloc = 262143,
  traceback_len = 0, traceback_data = 0x55bb4954d6d9 <native_base_init_wrapper>, args = 0x0}

Needless to say this is not supposed to happen.

(gdb) fr 1
#1  0x0000558b23a34e49 in mp_execute_bytecode (code_state=0x7f669081dbb0, inject_exc=0x0)
    at ../../py/vm.c:1461
1461	               mp_obj_exception_add_traceback(MP_OBJ_FROM_PTR(nlr.ret_val), source_file, source_line, block_name);
(gdb) inf locals
bytecode_start = 0x7f66907c52cf "\260\023y\022 \357CH\022\201E#\0064\001e\260\023i\022\034\334DC\022#e\260\024z6"
n_pos_args = 1
bc = 26
block_name = 138
n_kwonly_args = 0
line_info_top = 0x7f66907c52cf "\260\023y\022 \357CH\022\201E#\0064\001e\260\023i\022\034\334DC\022#e\260\024z6"
ip = 0x7f66907c52c8 "\220a(((#)\260\023y\022 \357CH\022\201E#\0064\001e\260\023i\022\034\334DC\022#e\260\024z6"
n_state = 8
n_exc_stack = 0
scope_flags = 7
n_def_pos_args = 0
n_info = 10
n_cell = 0
source_file = 1994
source_line = 357
nlr = {prev = 0x7fff5e2b9da0, ret_val = 0x7f669081dc80, regs = {
    0x558b23a2d77b <mp_execute_bytecode+186>, 0x7fff5e2b9cb0, 0x7fff5e2b97c8, 0x7f669080ea00,
    0x0, 0x7fff5e2bf5e0, 0x7f6690ce1000 <_rtld_global>, 0x558b23b685d8}}
entry_table = {0x558b23a335a5 <mp_execute_bytecode+24292> <repeats 16 times>,
... ... ...
  0x558b23a335a5 <mp_execute_bytecode+24292>, 0x558b23a335a5 <mp_execute_bytecode+24292>}
fastn = 0x7f669081dc10
exc_stack = 0x7f669081dc18
exc_sp = 0x7f669081dc00
__PRETTY_FUNCTION__ = "mp_execute_bytecode"
(

Additional Information

I tried 1.24 through 1.26 and got the same problem.

This is 100% reproducible on my system (Debian Trixie, amd64). Running valgrind on MicroPython doesn't report anything suspicious.

The code that triggers the problem is in moat/micro/_embed/lib/moat/lib/cmd/msg.py line 357

    async def send(self, *a, **kw) -> None:  # noqa: D102
        if not self._dir & SD_OUT:
            raise RuntimeError("This stream is read only")
        if self._stream_out != S_ON:
***         raise NoStream
        await self._skipped()
        await self.ml_send(a, kw, B_STREAM)

called from moat/micro/_embed/lib/app/_test.py line 50

    async def stream_it(self, msg: Msg):
        "Streams numbers."
        lim = msg.get("lim", -1)
        i = 0
        d = int(msg.get("delay", 0.1) * 1000)
        async with msg.stream_out() as s:
            while i != lim:
                await sleep_ms(d)
                try:
***                 await s.send(i)
                except NoStream:
                    break
                i += 1

At least one other test (tests/moat_micro/test_cfg.py::test_cfg) calls to line 357 without crashing.

Code of Conduct

Yes, I agree

CANDIDATE · ISSUE

Crash compiling(?) unusual code

closedby jepleropened 2025-08-02updated 2025-08-02
bug

Port, board and/or hardware

unix port, coverage, x86_64

MicroPython version

MicroPython v1.26.0-preview.521.g658a2e3dbd on 2025-08-02; linux [GCC 12.2.0] version

Reproduction

Run micropython with a snippet of unusual code:

$ ./build-coverage/micropython -c 'ans = (-1) ** 2.3; aa'
Segmentation fault

Expected behaviour

A NameError, because aa is not defined

Observed behaviour

A segfault

Additional Information

The stack trace is corrupt. ubsan asan all failed to give more useful info.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555756530 in mp_state_ctx ()
(gdb) where
#0  0x0000555555756530 in mp_state_ctx ()
#1  0x0000000000000000 in ?? ()

valgrind produced multiple diagnostics beginning with this, which looks interesting:

==1669951== Invalid write of size 8
==1669951==    at 0x16C3B4: nlr_jump (nlrx64.c:104)
==1669951==    by 0x1B23DE: fun_bc_call (objfun.c:352)
==1669951==    by 0x19E61E: mp_call_function_n_kw (runtime.c:727)
==1669951==    by 0x1A0DEA: mp_call_function_0 (runtime.c:701)
==1669951==    by 0x264DB8: execute_from_lexer (main.c:162)
==1669951==    by 0x264E67: do_str (main.c:315)
==1669951==    by 0x2658D3: main_ (main.c:656)
==1669951==    by 0x26619F: main (main.c:494)
==1669951==  Address 0x1ffefff888 is on thread 1's stack
==1669951==  232 bytes below stack pointer

I think there's something going on where an nlr jmp_buf registered inside fold_constants is somehow coming into play later when the NameError is thrown:

Breakpoint 1, nlr_push (nlr=nlr@entry=0x7fffffffdb10) at ../../py/nlrx64.c:55
55	unsigned int nlr_push(nlr_buf_t *nlr) {
(gdb) where
#0  nlr_push (nlr=nlr@entry=0x7fffffffdb10) at ../../py/nlrx64.c:55
#1  0x00005555556b0ae5 in execute_from_lexer (source_kind=source_kind@entry=1, 
    source=0x7fffffffe1a9, input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, 
    is_repl=is_repl@entry=false) at main.c:123
#2  0x00005555556b0e68 in do_str (str=<optimized out>) at main.c:315
#3  0x00005555556b18d4 in main_ (argc=argc@entry=3, argv=argv@entry=0x7fffffffddd8)
    at main.c:656
#4  0x00005555556b21a0 in main (argc=3, argv=0x7fffffffddd8) at main.c:494
(gdb) c
Continuing.

Breakpoint 1, nlr_push (nlr=nlr@entry=0x7fffffffd900) at ../../py/nlrx64.c:55
55	unsigned int nlr_push(nlr_buf_t *nlr) {
(gdb) where
#0  nlr_push (nlr=nlr@entry=0x7fffffffd900) at ../../py/nlrx64.c:55
#1  0x00005555555c5ea3 in binary_op_maybe (op=op@entry=MP_BINARY_OP_POWER, 
    lhs=0xffffffffffffffff, rhs=0x7ffff7c491e0, res=res@entry=0x7fffffffd998)
    at ../../py/parse.c:672
#2  0x00005555555c6d42 in fold_constants (parser=parser@entry=0x7fffffffda30, 
    rule_id=rule_id@entry=42 '*', num_args=2) at ../../py/parse.c:780
#3  0x00005555555c6ac2 in push_result_rule (parser=parser@entry=0x7fffffffda30, src_line=1, 
    rule_id=rule_id@entry=42 '*', num_args=<optimized out>) at ../../py/parse.c:1033
#4  0x00005555555c86b7 in mp_parse (lex=lex@entry=0x7ffff7c48bc0, 
    input_kind=input_kind@entry=MP_PARSE_FILE_INPUT) at ../../py/parse.c:1263
#5  0x00005555556b0b5c in execute_from_lexer (source_kind=source_kind@entry=1, 
    source=<optimized out>, input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, 
    is_repl=is_repl@entry=false) at main.c:147
#6  0x00005555556b0e68 in do_str (str=<optimized out>) at main.c:315
#7  0x00005555556b18d4 in main_ (argc=argc@entry=3, argv=argv@entry=0x7fffffffddd8)
    at main.c:656
#8  0x00005555556b21a0 in main (argc=3, argv=0x7fffffffddd8) at main.c:494
(gdb) c
Continuing.

Breakpoint 1, nlr_push (nlr=nlr@entry=0x7fffffffd990) at ../../py/nlrx64.c:55
55	unsigned int nlr_push(nlr_buf_t *nlr) {
(gdb) where
#0  nlr_push (nlr=nlr@entry=0x7fffffffd990) at ../../py/nlrx64.c:55
#1  0x00005555556218fa in mp_execute_bytecode (code_state=code_state@entry=0x7fffffffda20, 
    inject_exc=<optimized out>, inject_exc@entry=0x0) at ../../py/vm.c:301
#2  0x00005555555fe288 in fun_bc_call (self_in=0x7ffff7c48be0, n_args=0, n_kw=0, args=0x0)
    at ../../py/objfun.c:295
#3  0x00005555555ea61f in mp_call_function_n_kw (fun_in=0x7ffff7c48be0, 
    n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0) at ../../py/runtime.c:727
#4  0x00005555555ecdeb in mp_call_function_0 (fun=<optimized out>) at ../../py/runtime.c:701
#5  0x00005555556b0db9 in execute_from_lexer (source_kind=source_kind@entry=1, 
    source=<optimized out>, input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, 
    is_repl=is_repl@entry=false) at main.c:162
#6  0x00005555556b0e68 in do_str (str=<optimized out>) at main.c:315
#7  0x00005555556b18d4 in main_ (argc=argc@entry=3, argv=argv@entry=0x7fffffffddd8)
    at main.c:656
#8  0x00005555556b21a0 in main (argc=3, argv=0x7fffffffddd8) at main.c:494
(gdb) c
Continuing.

Breakpoint 2, nlr_jump (val=0x7ffff7c48ba0) at ../../py/nlrx64.c:103
103	MP_NORETURN void nlr_jump(void *val) {
(gdb) p mp_thread_get_state ()->nlr_top
$3 = (nlr_buf_t *) 0x7fffffffd990
(gdb) c
Continuing.

Breakpoint 2, nlr_jump (val=val@entry=0x7ffff7c48ba0) at ../../py/nlrx64.c:103
103	MP_NORETURN void nlr_jump(void *val) {
(gdb) p mp_thread_get_state ()->nlr_top
$4 = (nlr_buf_t *) 0x7fffffffd900
(gdb) where
#0  nlr_jump (val=val@entry=0x7ffff7c48ba0) at ../../py/nlrx64.c:103
#1  0x00005555555fe3df in fun_bc_call (self_in=<optimized out>, n_args=0, n_kw=0, args=0x0)
    at ../../py/objfun.c:352
#2  0x00005555555ea61f in mp_call_function_n_kw (fun_in=0x7ffff7c48be0, 
    n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0) at ../../py/runtime.c:727
#3  0x00005555555ecdeb in mp_call_function_0 (fun=<optimized out>) at ../../py/runtime.c:701
#4  0x00005555556b0db9 in execute_from_lexer (source_kind=source_kind@entry=1, 
    source=<optimized out>, input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, 
    is_repl=is_repl@entry=false) at main.c:162
#5  0x00005555556b0e68 in do_str (str=<optimized out>) at main.c:315
#6  0x00005555556b18d4 in main_ (argc=argc@entry=3, argv=argv@entry=0x7fffffffddd8)
    at main.c:656
#7  0x00005555556b21a0 in main (argc=3, argv=0x7fffffffddd8) at main.c:494

notice how the last nlr_buf_t in nlr_jmp is equal to the one inside the stack including binary_op_maybe called from fold_constants even though those are no longer on the stack.

This crash was found with AFLplusplus and minimized manually.

Code of Conduct

Yes, I agree

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied