Discussion of Python 3.7 support
This issue is intended to track the status of Python 3.7 core features as implemented by MicroPython. Not all of these changes should necessarily be implemented in MicroPython but documenting their status is important.
Python 3.7.0 (final) was released on the 27 June 2018. The Features for 3.7 are defined in PEP 537 and an explanation of the changes can be found in What's New in Python 3.7.
- PEP 538 - Coercing the legacy C locale to a UTF-8 based locale
- PEP 539 - A New C-API for Thread-Local Storage in CPython
- PEP 540 -
UTF-8mode - PEP 552 - Deterministic
pyc - PEP 553 - Built-in breakpoint()
- PEP 557 - Data Classes
- PEP 560 - Core support for typing module and generic types
- PEP 562 - Module
__getattr__and__dir__; see partial implementation (__getattr__): 454cca6016afc96deb6d1ad5d1b3553ab9ad18dd - PEP 563 - Postponed Evaluation of Annotations
- PEP 564 - Time functions with nanosecond resolution; see partial implementation: d4b61b00172ccc231307e3ef33f66f28cb6b051f
- PEP 565 - Show DeprecationWarning in
__main__ - PEP 567 - Context Variables
Other language changes
-
asyncandawaitare now reserved keywords -
dictobjects must preserve insertion-order - More than 255 arguments can now be passed to a function, and a function can now have more than 255 parameters
-
bytes.fromhex()andbytearray.fromhex()now ignore all ASCII whitespace, not only spaces -
str,bytes, andbytearraygained support for the newisascii()method, which can be used to test if a string or bytes contain only the ASCII characters -
ImportErrornow displays module name and module__file__path whenfrom ... import ... fails - Circular imports involving absolute imports with binding a submodule to a name are now supported
-
object.__format__(x, '')is now equivalent tostr(x)rather thanformat(str(self), '') - In order to better support dynamic creation of stack traces,
types.TracebackTypecan now be instantiated from Python code, and thetb_nextattribute on tracebacks is now writable - When using the
-mswitch,sys.path[0]is now eagerly expanded to the full starting directory path, rather than being left as the empty directory (which allows imports from the current working directory at the time when an import occurs) - The new
-X importtimeoption or thePYTHONPROFILEIMPORTTIMEenvironment variable can be used to show the timing of each module import
Changes to MicroPython built-in modules
- asyncio (many, may need a separate ticket)
- gc - New features:
gc.freeze(),gc.unfreeze(),gc-get_freeze_count - math -
math.remainder()added to implement IEEE 754-style remainder - re - A number of tidy up features including better support for splitting on empty strings and copy support for compiled expressions and match objects
- sys -
sys.breakpointhook()added.sys.get(/set)_coroutine_origin_tracking_depth()added. - time - Mostly updates to support nanosecond resolution in PEP564, see above.
(Changes to non-built-in modules will need to be documented elsewhere.)
Unicode support and PEP 393
Opening this as a discussion issue, so it can all be kept track of.
Python 3.3's str type supports the full Unicode range, with semantics defined by PEP 393 http://www.python.org/dev/peps/pep-0393/ (although some of the details there are CPython-specific). Currently, micropython pretends that strings are bytes, C-style, and will output them to a console without modification - so, for instance, a Unix console will interpret "\xC3\xBD" as U+00FD LATIN SMALL LETTER Y WITH ACUTE. (I have no idea what embedded devices do, but presumably it's ASCII-compatible or this issue would have come up long ago.)
Ideally and ultimately, micropython should support all of Unicode. The advantages to the language are huge (if you need me to elaborate, I can do so); in brief, Python 3 forces everyone to be correct. Correctness in Unicode is on par with correctness in memory management; it has some costs, but we willingly pay those costs as the price of guaranteeing that we won't leak memory or have buffer overruns.
But if that can't be done, or can't be done immediately, I'd like to see some means of catching problems before they happen; for instance, documenting that all encodings used MUST be ASCII-compatible, and raising an exception if a str has any character >127 in it.
I've had a bit of a look at objstr.c, and it seems that the character/byte equivalence is, unfortunately, endemic. Not only is the representation all byte-based, but helpers like is_ws() are defined by ASCII. (In CPython, "spam\xA0spam\u3000spam".split() == ["spam","spam","spam"], because U+00A0 and U+3000 are flagged whitespace.) This could be changed, but it will likely mean significant changes, and will almost certainly result in code size increases; although one of the beauties of PEP 393 strings is that, for ASCII-only strings (and even Latin-1 strings), the string in memory is no larger than it would be if stored as bytes (modulo the two-bit flag in the header, stating what the size is).
The most important question is, how much do other parts of the code dip into strings, and therefore how much impact will a change of internal representation have? I tried adding an arbitrary member to the structure, and it seemed to compile okay, and there don't seem to be any other files referencing the structure directly.
How do you feel about me doing up some approximation of PEP 393 into objstr.c? It'd be a fairly significant change.