Micropython allows creation of non UTF-8 identifiers
Port, board and/or hardware
unix, standard build
MicroPython version
MicroPython v1.28.0-preview.18.g6341258207 on 2025-12-26; linux [GCC 14.2.0] version
Reproduction
In the repl, use exec() on a bytestring with invalid UTF-8:
>>> exec(b"a\xff = None")
>>> dir()
['__name__', 'a\x00']
Expected behaviour
An exception should be issued like https://github.com/micropython/micropython/pull/17862 wanted to do.
Observed behaviour
An identifier whose content is not valid UTF-8 text is created and can be seen e.g., in dir().
Additional Information
This is separate from exactly following CPython rules for which Unicode code points can form identifiers. E.g., micropython accepts 💡= True but CPython rejects it.
Code of Conduct
Yes, I agree
Viper: ptr8(0)[0] = 1 causes SIGSEGV on unix port
Port, board and/or hardware
Unix
MicroPython version
MicroPython v1.27.0-preview.107.gd1607598f on 2025-09-09; linux [GCC 14.2.0] version
Reproduction
try:
import micropython
except ImportError:
print("SKIP missing micropython")
else:
try:
@micropython.viper
def poke0():
p = ptr8(0)
p[0] = 1
try:
poke0()
print("should not reach here")
except Exception as e:
print("EXC", type(e).__name__)
except AttributeError:
print("SKIP viper_not_available")
Expected behaviour
it would be helpful if either:
- Viper rejected obviously invalid addresses like 0 (and perhaps addr + size overflow) with a Python exception in a debug/safe build mode, or
- The docs explicitly call out that such code will crash the process.
Observed behaviour
The process crashes with SIGSEGV at the generated Viper instruction
[#0] 0x7ffff7fba051 → mov BYTE PTR [rbx], dl
[#1] 0x7ffff7e27f40 → rex.WX sbb rax, QWORD PTR [rax]
[#2] 0x5555555f39e4 → mp_cstack_usage()
[#3] 0x7ffff7e27b60 → rcr ch, 1
Additional Information
No, I've provided everything above.
Code of Conduct
Yes, I agree