Discussion of Python 3.9 support
This issue is intended to track the status of Python 3.9 core features as implemented by MicroPython.
Python 3.9.0 (final) was released on the 5th October 2020. The Features for 3.9 are defined in PEP 596 and a detailed description of the changes can be found in What's New in Python 3.9.
- PEP 584, union operators added to dict;
- PEP 585, type hinting generics in standard collections;
- PEP 614, relaxed grammar restrictions on decorators.
- PEP 616, string methods to remove prefixes and suffixes.
- PEP 593, flexible function and variable annotations;
- PEP 573, fast access to module state from methods of C extension types;
- PEP 617, CPython now uses a new parser based on PEG;
- PEP 615, the IANA Time Zone Database is now present in the standard library in the zoneinfo module;
- PEP 602, CPython adopts an annual release cycle. Instead of annual, aiming for two month release cycle
-
__import__()now raisesImportErrorinstead ofValueError; Done, see 53519e322a5a0bb395676cdaa132f5e82de22909 - Python now gets the absolute path of the script filename specified on the command line (ex:
python3 script.py): the__file__attribute of the__main__module became an absolute path, rather than a relative path. - By default, for best performance, the errors argument is only checked at the first encoding/decoding error and the encoding argument is sometimes ignored for empty strings.
-
"".replace("", s, n)now returnssinstead of an empty string for all non-zero n. It is now consistent with"".replace("", s). - Any valid expression can now be used as a decorator. Previously, the grammar was much more restrictive.
- Parallel running of
aclose()/asend()/athrow()is now prohibited, and ag_running now reflects the actual running status of the async generator. - Unexpected errors in calling the
__iter__method are no longer masked by TypeError in the in operator and functions contains(), indexOf() and countOf() of the operator module. - Unparenthesized lambda expressions can no longer be the expression part in an if clause in comprehensions and generator expressions.
Changes to MicroPython built-in modules
- asyncio
- Due to significant security concerns, the reuse_address parameter of
asyncio.loop.create_datagram_endpoint()is no longer supported - Added a new coroutine
shutdown_default_executor()that schedules a shutdown for the default executor that waits on theThreadPoolExecutorto finish closing. Also,asyncio.run()has been updated to use the new coroutine. - Added
asyncio.PidfdChildWatcher, a Linux-specific child watcher implementation that polls process file descriptors - Added a new
coroutine asyncio.to_thread() - When cancelling the task due to a timeout,
asyncio.wait_for()will now wait until the cancellation is complete also in the case when timeout is <= 0, like it does with positive timeouts. -
asyncionow raisesTyperErrorwhen calling incompatible methods with anssl.SSLSocketsocket
- Due to significant security concerns, the reuse_address parameter of
- gc
- Garbage collection does not block on resurrected objects
- Added a new function
gc.is_finalized()to check if an object has been finalized by the garbage collector
- math
- Expanded the
math.gcd()function to handle multiple arguments. Formerly, it only supported two arguments. - Added
math.lcm(): return the least common multiple of specified arguments - Added
math.nextafter(): return the next floating-point value after x towards y - Added
math.ulp(): return the value of the least significant bit of a float
- Expanded the
- os
- Exposed the Linux-specific
os.pidfd_open()andos.P_PIDFD - The
os.unsetenv()function is now also available on Windows - The
os.putenv()andos.unsetenv()functions are now always available - Added
os.waitstatus_to_exitcode()function: convert a wait status to an exit code
- Exposed the Linux-specific
- random - Added a new
random.Random.randbytesmethod: generate random bytes
- sys
- Added a new
sys.platlibdir attribute: name of the platform-specific library directory - Previously,
sys.stderrwas block-buffered when non-interactive. Now stderr defaults to always being line-buffered.
- Added a new
(Changes to non-built-in modules will need to be documented elsewhere.)
Unicode support and PEP 393
Opening this as a discussion issue, so it can all be kept track of.
Python 3.3's str type supports the full Unicode range, with semantics defined by PEP 393 http://www.python.org/dev/peps/pep-0393/ (although some of the details there are CPython-specific). Currently, micropython pretends that strings are bytes, C-style, and will output them to a console without modification - so, for instance, a Unix console will interpret "\xC3\xBD" as U+00FD LATIN SMALL LETTER Y WITH ACUTE. (I have no idea what embedded devices do, but presumably it's ASCII-compatible or this issue would have come up long ago.)
Ideally and ultimately, micropython should support all of Unicode. The advantages to the language are huge (if you need me to elaborate, I can do so); in brief, Python 3 forces everyone to be correct. Correctness in Unicode is on par with correctness in memory management; it has some costs, but we willingly pay those costs as the price of guaranteeing that we won't leak memory or have buffer overruns.
But if that can't be done, or can't be done immediately, I'd like to see some means of catching problems before they happen; for instance, documenting that all encodings used MUST be ASCII-compatible, and raising an exception if a str has any character >127 in it.
I've had a bit of a look at objstr.c, and it seems that the character/byte equivalence is, unfortunately, endemic. Not only is the representation all byte-based, but helpers like is_ws() are defined by ASCII. (In CPython, "spam\xA0spam\u3000spam".split() == ["spam","spam","spam"], because U+00A0 and U+3000 are flagged whitespace.) This could be changed, but it will likely mean significant changes, and will almost certainly result in code size increases; although one of the beauties of PEP 393 strings is that, for ASCII-only strings (and even Latin-1 strings), the string in memory is no larger than it would be if stored as bytes (modulo the two-bit flag in the header, stating what the size is).
The most important question is, how much do other parts of the code dip into strings, and therefore how much impact will a change of internal representation have? I tried adding an arbitrary member to the structure, and it seemed to compile okay, and there don't seem to be any other files referencing the structure directly.
How do you feel about me doing up some approximation of PEP 393 into objstr.c? It'd be a fairly significant change.