QUERY · ISSUE

Micropython allows creation of non UTF-8 identifiers

openby jepleropened 2025-12-26updated 2026-01-05

bugunicode

Port, board and/or hardware

unix, standard build

MicroPython version

MicroPython v1.28.0-preview.18.g6341258207 on 2025-12-26; linux [GCC 14.2.0] version

Reproduction

In the repl, use exec() on a bytestring with invalid UTF-8:

>>> exec(b"a\xff = None")
>>> dir()
['__name__', 'a\x00']

Expected behaviour

An exception should be issued like https://github.com/micropython/micropython/pull/17862 wanted to do.

Observed behaviour

An identifier whose content is not valid UTF-8 text is created and can be seen e.g., in dir().

Additional Information

This is separate from exactly following CPython rules for which Unicode code points can form identifiers. E.g., micropython accepts 💡= True but CPython rejects it.

Code of Conduct

Yes, I agree

CANDIDATE · ISSUE

formatting character values >= 128 gives unexpected results, can crash

openby jepleropened 2023-11-28updated 2026-01-20

bugunicode

$ ./build-coverage/micropython
MicroPython 9c7067d9ad on 2023-11-28; linux [GCC 12.2.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> s = f"{160:c}"
>>> len(s)
0
>>> s
' '
>>> print(s)
�

The same for s = "%c" % 160.

I think this is because the 'c' formatter in objstr.c doesn't handle non-ASCII characters properly. It looks like this ends up being another way to get improper UTF-8 into a str() object, too.

This can lead to a crash when an invalid string beginning with the byte 255 is generated, just like #17855:

MicroPython v1.27.0-preview.95.g9939565d50 on 2025-09-03; linux [GCC 14.2.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> raise ValueError(f"{255:c}" + f"{254:c}" * 4096)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Segmentation fault

Field widths function by bytes, not code points, so you can also produce improper utf-8 with print('%.1s' % chr(233)).