QUERY · ISSUE

Crash printing exception detail when source code is not valid UTF-8

openby jepleropened 2025-08-07updated 2026-01-20

bugunicode

Port, board and/or hardware

unix port, coverage build, x86_64 linux

MicroPython version

MicroPython v1.26.0-preview.524.g255d74b5a8 on 2025-08-06; linux [GCC 12.2.0] version

Reproduction

# Slight changes (like removing the derived exception type) move the misbehavior
# around. For instance, in my local build, not having this triggers
# 'NotImplementedError: opcode' instead.
class Dummy(BaseException):
    pass

# Smuggle invalid UTF-8 string into decompress_error_text_maybe
# This invalid UTF-8 string acts matches the test MP_IS_COMPRESSED_ROM_STRING
# This can also happen if the input file is not a valid UTF-8 file.
b = eval(b"'\xff" + b"\xfe" * 4096 + b"'")
try:
    raise BaseException(b)
except BaseException as good:
    print(type(good), good.args[0])

Expected behaviour

CPython fails the eval() with SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte.

Observed behaviour

The invalid utf-8 string can be successfully created. When fetching the exception's args property, a crash occurs inside of mp_decompress_rom_string (which should never have been called). The call occurs because the first byte of the invalid UTF-8 string is \xff, the marker for compressed ROM strings.

Additional Information

Found via fuzzer, manually minimized.

Code of Conduct

Yes, I agree

CANDIDATE · ISSUE

formatting character values >= 128 gives unexpected results, can crash

openby jepleropened 2023-11-28updated 2026-01-20

bugunicode

$ ./build-coverage/micropython
MicroPython 9c7067d9ad on 2023-11-28; linux [GCC 12.2.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> s = f"{160:c}"
>>> len(s)
0
>>> s
' '
>>> print(s)
�

The same for s = "%c" % 160.

I think this is because the 'c' formatter in objstr.c doesn't handle non-ASCII characters properly. It looks like this ends up being another way to get improper UTF-8 into a str() object, too.

This can lead to a crash when an invalid string beginning with the byte 255 is generated, just like #17855:

MicroPython v1.27.0-preview.95.g9939565d50 on 2025-09-03; linux [GCC 14.2.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> raise ValueError(f"{255:c}" + f"{254:c}" * 4096)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Segmentation fault

Field widths function by bytes, not code points, so you can also produce improper utf-8 with print('%.1s' % chr(233)).