← index #13220Issue #13002
Off-topic · high · value 2.141
QUERY · ISSUE

heap buffer overflow found at micropython/lib/re1.5/compilecode.c:68 [micropython@a5bdd39127]

openby zeroone-kropened 2023-12-18updated 2024-01-31
bug

Hi all, I'm Wonil Jang, from the research group S2Lab in UNIST.
We found a heap buffer overflow bug from micropython by our custom tool. The detailed information is as follows.

Environment

  • OS: Ubuntu 22.04
  • Version: micropython at commit a5bdd39127
  • Build: unix ports
  • Bug Type: heap-buffer-overflow
  • Bug Location: micropython/lib/re1.5/compilecode.c:68
  • Credits: Junwha Hong and Wonil Jang, from S2Lab in UNIST

Problem Statement

...
case '[': {
            int cnt;
            term = PC;
            re++;
            if (*re == '^') {
                EMIT(PC++, ClassNot);
                re++;
            } else {
                EMIT(PC++, Class);
            }
            PC++; // Skip # of pair byte
            prog->len++;
            for (cnt = 0; *re != ']'; re++, cnt++) {    <== [1]
                char c = *re;
                if (c == '\\') {
                    ++re;
                    c = *re;
                    if (MATCH_NAMED_CLASS_CHAR(c)) {
                        c = RE15_CLASS_NAMED_CLASS_INDICATOR;
                        goto emit_char_pair;
                    }
                }
                if (!c) return NULL;
                if (re[1] == '-' && re[2] != ']') {    <== [2]
                    re += 2;
                }
            emit_char_pair:
                EMIT(PC++, c);
                EMIT(PC++, *re);
            }
            EMIT_CHECKED(term + 1, cnt);
            break;
        }
...

At [1], We found a heap-buffer-overflow bug. This is because there is no verification of regex string length.
For instance, If we have a regex string ([a-zA-Z_0-(null), it will process 0-(null) finally.
and then, at [2], re pointer will point to next byte after 0-(null).
Because It doesn’t have verification logic of regex string length, it will try to read a byte after 0-(null), which means out-of-bound read.

Example (replication)

>>> import re
>>> re.sub(r"([a-zA-Z_0-", "b", "1a2a3a")
=================================================================
    #0 0x5555557b9461 in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:68:27
    #1 0x5555557b97ed in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:103:18
    #2 0x5555557b78da in re1_5_sizecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:194:9
    #3 0x5555557b78da in mod_re_compile /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:427:16
    #4 0x5555557b6cd7 in re_sub_helper /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:287:16

Patch

The simplest patch will be adding verification of regex re length.

Log

    #0 0x5555557b9461 in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:68:27
    #1 0x5555557b97ed in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:103:18
    #2 0x5555557b78da in re1_5_sizecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:194:9
    #3 0x5555557b78da in mod_re_compile /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:427:16
    #4 0x5555557b6cd7 in re_sub_helper /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:287:16
    #5 0x555555782c1c in mp_execute_bytecode /home/qbit/testing-2023/micropython/ports/unix/../../py/vm.c:1042:21
    #6 0x55555574261b in fun_bc_call /home/qbit/testing-2023/micropython/ports/unix/../../py/objfun.c:273:42
    #7 0x555555903f3d in execute_from_lexer /home/qbit/testing-2023/micropython/ports/unix/main.c:161:13
    #8 0x555555902ad5 in do_file /home/qbit/testing-2023/micropython/ports/unix/main.c:310:12
    #9 0x555555902ad5 in main_ /home/qbit/testing-2023/micropython/ports/unix/main.c:722:19
    #10 0x7ffff7c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #11 0x7ffff7c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #12 0x555555593a34 in _start (/home/qbit/testing-2023/micropython/ports/unix/build-standard/micropython+0x3fa34)

Thank you for reading my report.

CANDIDATE · ISSUE

heap-buffer-over-flow(.mpy): from importing malformed npy module

closedby junwhaopened 2023-11-17updated 2023-11-21
bug

Because this bug is complicated, it would be better to read this report from PDF report.
heap-buffer-overflow #1.pdf

Thank you for taking the time to review our bug report! :)

Summary

  • OS: Ubuntu 22.04
  • version: micropython@a00c9d56db775ee5fc14c2db60eb07bab8e872dd
  • port: unix
  • contribution: Junwha Hong and Wonil Jang @S2-Lab, UNIST
  • description:
    • Importing malformed .mpy module leads to buffer overflow at byte code executor
    • A code chunk was allocated at persistentcode.c:249
    • The chunk was accessed at at vm.c:311 by code pointer ip
      • which was pre-determined at mp_setup_code_state_helper (bc.c:140)

PoC Generation

PoC can be generated from bellow code, bof_module_gen.py

try:
    import sys, io, os

    sys.implementation._mpy
    io.IOBase
    os.mount
except (ImportError, AttributeError):
    print("Error")
    raise SystemExit

mpy_arch = sys.implementation._mpy >> 8

# these are the test .mpy files
valid_header = bytes([77, 6, mpy_arch, 31])
# fmt: off
user_files = {
    # bad architecture (mpy_arch needed for sub-version)
    '/mod1.mpy': valid_header + (
        b'\x02' # n_qstr
        b'\x00' # n_obj
        b'\x02' # 2 children
         b'\x42' # 8 bytes, no children, viper code
                b'\x00[x00' # dummy machine code
    ),
}
# fmt: on

with open("bof_module.mpy", "wb") as f:
    f.write(user_files['/mod1.mpy'])
micropython bof_module_gen.py
micropython -c "import bof_module"

Crash Log

#0 0x5555557855f4 in mp_execute_bytecode /home/qbit/testing-2023/micropython/ports/unix/../../py/vm.c
#1 0x55555574261b in fun_bc_call /home/qbit/testing-2023/micropython/ports/unix/../../py/objfun.c:273:42
#2 0x555555777269 in do_load /home/qbit/testing-2023/micropython/ports/unix/../../py/builtinimport.c
#3 0x555555776022 in process_import_at_level /home/qbit/testing-2023/micropython/ports/unix/../../py/builtinimport.c:512:9
#4 0x555555776022 in mp_builtin___import___default /home/qbit/testing-2023/micropython/ports/unix/../../py/builtinimport.c:607:35
#5 0x5555557276b0 in mp_import_name /home/qbit/testing-2023/micropython/ports/unix/../../py/runtime.c:1525:12
#6 0x555555782770 in mp_execute_bytecode /home/qbit/testing-2023/micropython/ports/unix/../../py/vm.c:1235:21
#7 0x55555574261b in fun_bc_call /home/qbit/testing-2023/micropython/ports/unix/../../py/objfun.c:273:42
#8 0x555555903f3d in execute_from_lexer /home/qbit/testing-2023/micropython/ports/unix/main.c:161:13
#9 0x555555902f5d in do_str /home/qbit/testing-2023/micropython/ports/unix/main.c:314:12
#10 0x555555902f5d in main_ /home/qbit/testing-2023/micropython/ports/unix/main.c:635:23
#11 0x7ffff7c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#12 0x7ffff7c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#13 0x555555593a34 in _start (/home/qbit/testing-2023/micropython/ports/unix/build-standard/micropython+0x3fa34)

Description of malformed module

First of all, we need to keep in mind that how our malformed .npy module was loaded into micropython runtime.

hexdump -C  bof_module.mpy

00000000  4d 06 0a 1f **02 00 02 42  00 5b 78 30 30**           |M......B.[x00|
0000000d

architecture of .mpy file from docs

size field value (hex)
byte value 0x4d (ASCII ‘M’) 4d
byte .mpy major version number 06
byte native arch and minor version number (was feature flags in older versions) 0a
byte number of bits in a small int 1f

little endian

size field value (hex)
vuint number of qstrs 02
vuint number of constant objects 00
qstr data {len, data}
{02>>1=01, 42}
{00>>1=00}
encoded constant objects -
size field value
vuint type, size and whether there are sub-raw-code elements 5b (type=?
size=15)
code (bytecode or machine code) ?
vuint number of sub-raw-code elements (only if non-zero) ?
sub-raw-code elements ?

If we validate this by GDB in runtime, from py/persistentcode.c:405.


(gdb) x/40b reader->data
0x7ffff7a05320: 0x60 0x53 0xa0 0xf7 0xff 0x7f 0x00 0x00
0x7ffff7a05328: 0x0d 0x00 0x00 0x00 0x4d 0x06 0x0a 0x1f
0x7ffff7a05330: 0x02 0x00 0x02 0x42 0x00 0x5b 0x78 0x30
0x7ffff7a05338: 0x30 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffff7a05340: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

(0x00 will be 0xff)


From the memory map, it is clear that the valid code length is just 3 bytes, but it tries to read 15 bytes. thus, it reads 0xFF, which is EOF signal returned by mp_reader_vfs_readbyte.

STATIC mp_raw_code_t *load_raw_code(mp_reader_t *reader, mp_module_context_t *context) {
    // Load function kind and data length
    **size_t kind_len = read_uint(reader);**
    int kind = (kind_len & 3) + MP_CODE_BYTECODE;
    bool has_children = !!(kind_len & 4);
    size_t fun_data_len = kind_len >> 3;

    #if !MICROPY_EMIT_MACHINE_CODE
    if (kind != MP_CODE_BYTECODE) {
        mp_raise_ValueError(MP_ERROR_TEXT("incompatible .mpy file"));
    }
    #endif

    uint8_t *fun_data = NULL;
    #if MICROPY_EMIT_MACHINE_CODE
    size_t prelude_offset = 0;
    mp_uint_t native_scope_flags = 0;
    mp_uint_t native_n_pos_args = 0;
    mp_uint_t native_type_sig = 0;
    #endif

    if (kind == MP_CODE_BYTECODE) {
        // Allocate memory for the bytecode
        **fun_data = m_new(uint8_t, fun_data_len);
        // Load bytecode
        read_bytes(reader, fun_data, fun_data_len);**

...

but the root cause is here (although neglecting file reader EOF is also a problem), but at the actual parsing code of this 15 bytes fun_data .

Problem Statement

How was the chunk rc→fun_data accessed with out-of-bound index?

STATIC void do_load(mp_module_context_t *module_obj, vstr_t *file) {
   ...
        mp_raw_code_load_file(**file_qstr**, &cm); <- allocation
        do_execute_raw_code(cm.context, cm.rc, **file_qstr**); <- bof
...

by inserting breakpoint on the allocation site of fun_data, we can get the information of the chunk fun_data

b persistentcode.c:249
run
  • fun_data: [0x7ffff7a053c0, 0x7ffff7a053cf)
  • fun_data_len: 15

Also, if we check the further executions with n command, it is clear that fun_data is shipped into bytecode field of the function object.

mp_obj_t mp_make_function_from_raw_code(const mp_raw_code_t *rc, const mp_module_context_t *context, const mp_obj_t *def_args) {
    DEBUG_OP_printf("make_function_from_raw_code %p\n", rc);
    assert(rc != NULL);

    // def_args must be MP_OBJ_NULL or a tuple
    assert(def_args == NULL || def_args[0] == MP_OBJ_NULL || mp_obj_is_type(def_args[0], &mp_type_tuple));

    // def_kw_args must be MP_OBJ_NULL or a dict
    assert(def_args == NULL || def_args[1] == MP_OBJ_NULL || mp_obj_is_type(def_args[1], &mp_type_dict));

    // make the function, depending on the raw code kind
    mp_obj_t fun;
    switch (rc->kind) {
    ...
        default:
            // rc->kind should always be set and BYTECODE is the only remaining case
            assert(rc->kind == MP_CODE_BYTECODE);
            **fun = mp_obj_new_fun_bc(def_args, rc->fun_data, context, rc->children);**
           ...
mp_obj_t mp_obj_new_fun_bc(const mp_obj_t *def_args, const byte *code, const mp_module_context_t *context, struct _mp_raw_code_t *const *child_table) {
...
    mp_obj_fun_bc_t *o = mp_obj_malloc_var(mp_obj_fun_bc_t, mp_obj_t, n_extra_args, &mp_type_fun_bc);
    o->bytecode = code;
...
    return MP_OBJ_FROM_PTR(o);
}
(gdb) x/10g o
0x7ffff7a05300: 0x00005555556313a0      0x0000000000000000
0x7ffff7a05310: 0x0000000000000000      0x00007ffff7a053c0
0x7ffff7a05320: 0x00000000000000ba      0x0000000000001f12
0x7ffff7a05330: 0x0000000000000b12      0x0000000000001f1a
0x7ffff7a05340: 0x0000000000000000      0x0000000000000000

From the generated module_fun, it executes a function (our target chunk is now in module_fun.bytecode)

mp_obj_t module_fun = mp_make_function_from_raw_code(rc, context, NULL);
mp_call_function_0(module_fun);

And we observed that the code_state→self function has the overflowed ip field exactly after INIT_CODESTATE.

STATIC mp_obj_t fun_bc_call(mp_obj_t self_in, size_t n_args, size_t n_kw, const mp_obj_t *args) {
    ...
    mp_obj_fun_bc_t *self = MP_OBJ_TO_PTR(self_in); // 0x7ffff7a05300

    size_t n_state, state_size; // 7, 56
    DECODE_CODESTATE_SIZE(self->bytecode, n_state, state_size);

    // allocate state for locals and stack
    mp_code_state_t *code_state = NULL;
    #if MICROPY_ENABLE_PYSTACK
    code_state = mp_pystack_alloc(offsetof(mp_code_state_t, state) + state_size);
    #else
    if (state_size > VM_MAX_STATE_ON_STACK) {
        code_state = m_new_obj_var_maybe(mp_code_state_t, state, byte, state_size);
        #if MICROPY_DEBUG_VM_STACK_OVERFLOW
        if (code_state != NULL) {
            memset(code_state->state, 0, state_size);
        }
        #endif
    }
    if (code_state == NULL) {
        code_state = alloca(offsetof(mp_code_state_t, state) + state_size);
        #if MICROPY_DEBUG_VM_STACK_OVERFLOW
        memset(code_state->state, 0, state_size);
        #endif
        state_size = 0; // indicate that we allocated using alloca
    }
    #endif

    **INIT_CODESTATE(code_state, self, n_state, n_args, n_kw, args);**

    // execute the byte code with the correct globals context
    mp_globals_set(self->context->module.globals);
    **mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL); <- BOF Access**
...
}

Further investigation of INIT_CODESTATE

INIT_CODESTATE is caller macro of mp_setup_code_state function

#define INIT_CODESTATE(code_state, _fun_bc, _n_state, n_args, n_kw, args) \
    code_state->fun_bc = _fun_bc; \
    code_state->n_state = _n_state; \
    mp_setup_code_state(code_state, n_args, n_kw, args); \
    code_state->old_globals = mp_globals_get();

we can observe that ip is set as bytecode (the chunk [0x7ffff7a053c0, 0x7ffff7a053cf)) at mp_setup_code_state. till now, the ip is in valid state.

void mp_setup_code_state(mp_code_state_t *code_state, size_t n_args, size_t n_kw, const mp_obj_t *args) {
    code_state->ip = code_state->fun_bc->bytecode; // 0x7ffff7a053c0
    code_state->sp = &code_state->state[0] - 1;
    #if MICROPY_STACKLESS
    code_state->prev = NULL;
    #endif
    #if MICROPY_PY_SYS_SETTRACE
    code_state->prev_state = NULL;
    code_state->frame = NULL;
    #endif
    **mp_setup_code_state_helper(code_state, n_args, n_kw, args);**
}
b mp_setup_code_state
run
c

Inside mp_setup_code_state_helper, ip changes its value by ip = code_state->ip + n_info; and here, invalid address 0x7ffff7a053da is assigned.

It means that the n_info is the root cause!

// On entry code_state should be allocated somewhere (stack/heap) and
// contain the following valid entries:
//    - code_state->fun_bc should contain a pointer to the function object
//    **- code_state->ip should contain a pointer to the beginning of the prelude**
//    - code_state->sp should be: &code_state->state[0] - 1
//    - code_state->n_state should be the number of objects in the local state
STATIC void mp_setup_code_state_helper(mp_code_state_t *code_state, size_t n_args, size_t n_kw, const mp_obj_t *args) {
    // This function is pretty complicated.  It's main aim is to be efficient in speed and RAM
    // usage for the common case of positional only args.

    // get the function object that we want to set up (could be bytecode or native code)
    mp_obj_fun_bc_t *self = code_state->fun_bc;

    // Get cached n_state (rather than decode it again)
    size_t n_state = code_state->n_state;

    // Decode prelude
    size_t n_state_unused, n_exc_stack_unused, scope_flags, n_pos_args, n_kwonly_args, n_def_pos_args;
    **MP_BC_PRELUDE_SIG_DECODE_INTO(code_state->ip, n_state_unused, n_exc_stack_unused, scope_flags, n_pos_args, n_kwonly_args, n_def_pos_args);
    MP_BC_PRELUDE_SIZE_DECODE(code_state->ip);**
    (void)n_state_unused;
    (void)n_exc_stack_unused;

   ...

    // jump over code info (source file, argument names and line-number mapping)
    **const uint8_t *ip = code_state->ip + n_info;**

    // bytecode prelude: initialise closed over variables
    for (; n_cell; --n_cell) {
        size_t local_num = *ip++;
        code_state_state[n_state - 1 - local_num] =
            mp_obj_new_cell(code_state_state[n_state - 1 - local_num]);
    }

    // now that we skipped over the prelude, set the ip for the VM
    **code_state->ip = ip;**

    DEBUG_printf("Calling: n_pos_args=%d, n_kwonly_args=%d\n", n_pos_args, n_kwonly_args);
    dump_args(code_state_state + n_state - n_pos_args - n_kwonly_args, n_pos_args + n_kwonly_args);
    dump_args(code_state_state, n_state);
}

n_info was constructed over 15 (the allocated length for the code chunk), at MP_BC_PRELUDE_SIZE_DECODE.

#define MP_BC_PRELUDE_SIZE_DECODE(ip) \
    size_t n_info, n_cell; \
    MP_BC_PRELUDE_SIZE_DECODE_INTO(ip, n_info, n_cell); \
    (void)n_info; (void)n_cell

#define MP_BC_PRELUDE_SIZE_DECODE_INTO(ip, I, C)                \
    do {                                                            \
        uint8_t z;                                                  \
        C = 0;                                                      \
        I = 0;                                                      \
        for (unsigned n = 0;; ++n) {                                \
            z = *(ip)++;                                            \
            /* xIIIIIIC */                                          \
            C |= (z & 1) << n;                                      \
            I |= ((z & 0x7e) >> 1) << (6 * n);                      \
            if (!(z & 0x80)) {                                      \
                break;                                              \
            }                                                       \
        }                                                           \
    } while (0) 

from GDB, we observed that MP_BC_PRELUDE_SIG_DECODE_INTO increases ip as 0x7ffff7a053c1, and then, MP_BC_PRELUDE_SIZE_DECODE reads n_info from there.

MP_BC_PRELUDE_SIG_DECODE_INTO read z as 2, (p printf("%d", ((char*)0x7ffff7a053c1)) ⇒ 2)

MP_BC_PRELUDE_SIZE_DECODE construct n_info by

at n == 0;

I |= ((z & 0x7e) >> 1) << (6 * n);

z & 0x7e = 0x80
0x80 >> 1 = 0x40
0x40 << (6*n) = 0x40

and, because next one is 0xFF it escape here

**=> n_info = 0x40 = 24 which is incorrectly rebase the code pointer ip later**

and after the first defined n_info, it was increased by 2 again, then the code pointer ip was rebased with the offset 26.

Thus, the root cause is that it does not validate the length of the code_state→ip before it rebase the ip.

To summarize,

  • Chunk was allocated as: 0x7ffff7a053c0 with 0xf bytes => [0x7ffff7a053c0, 0x7ffff7a053cf)
    • at persistentcode.c:249
  • And accessed with out-of-bounds at: 0x7ffff7a053da (code pointer ip)
    • at vm.c:311
    • which was determined at mp_setup_code_state_helper (bc.c:140)

Patch

To prevent this kind of malformed input leading to the buffer overflow, we need to move the length related parsing functions to parsers and enforcing the validation logic while reading the file.

Discussion

We think you already tested this kind of features in the tests of tests/micropython/import_mpy*.py, but it seems it’s not sufficient.

Also, this bug can be seemed not that critical because importing this module produce “NotImplementedError: opcode”, but it’s actually very critical because the out-of-bounds occurs before the NotImplementedError, and it means that this buffer-over-flow can affect runtime if a user has code like:

try:
    import module_A # module_A is malformed by attacker
except:
    import module_A_alternative

...

# Normal routine

Thank you for taking the time to review our bug report! :)

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied