← index #18609Issue #17855
Related · high · value 3.340
QUERY · ISSUE

Micropython allows creation of non UTF-8 identifiers

openby jepleropened 2025-12-26updated 2026-01-05
bugunicode

Port, board and/or hardware

unix, standard build

MicroPython version

MicroPython v1.28.0-preview.18.g6341258207 on 2025-12-26; linux [GCC 14.2.0] version

Reproduction

In the repl, use exec() on a bytestring with invalid UTF-8:

>>> exec(b"a\xff = None")
>>> dir()
['__name__', 'a\x00']

Expected behaviour

An exception should be issued like https://github.com/micropython/micropython/pull/17862 wanted to do.

Observed behaviour

An identifier whose content is not valid UTF-8 text is created and can be seen e.g., in dir().

Additional Information

This is separate from exactly following CPython rules for which Unicode code points can form identifiers. E.g., micropython accepts 💡= True but CPython rejects it.

Code of Conduct

Yes, I agree

CANDIDATE · ISSUE

Crash printing exception detail when source code is not valid UTF-8

openby jepleropened 2025-08-07updated 2026-01-20
bugunicode

Port, board and/or hardware

unix port, coverage build, x86_64 linux

MicroPython version

MicroPython v1.26.0-preview.524.g255d74b5a8 on 2025-08-06; linux [GCC 12.2.0] version

Reproduction

# Slight changes (like removing the derived exception type) move the misbehavior
# around. For instance, in my local build, not having this triggers
# 'NotImplementedError: opcode' instead.
class Dummy(BaseException):
    pass

# Smuggle invalid UTF-8 string into decompress_error_text_maybe
# This invalid UTF-8 string acts matches the test MP_IS_COMPRESSED_ROM_STRING
# This can also happen if the input file is not a valid UTF-8 file.
b = eval(b"'\xff" + b"\xfe" * 4096 + b"'")
try:
    raise BaseException(b)
except BaseException as good:
    print(type(good), good.args[0])

Expected behaviour

CPython fails the eval() with SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte.

Observed behaviour

The invalid utf-8 string can be successfully created. When fetching the exception's args property, a crash occurs inside of mp_decompress_rom_string (which should never have been called). The call occurs because the first byte of the invalid UTF-8 string is \xff, the marker for compressed ROM strings.

Additional Information

Found via fuzzer, manually minimized.

Code of Conduct

Yes, I agree

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied