QUERY · ISSUE

PEP 634 Structural Pattern Matching (Python 3.10) support

openby nickovsopened 2021-09-23updated 2023-06-13

py-core

CPython 3.10 release candidate 2 is now out, and the final release is less than two weeks away. By far the biggest change in the release is support for the new Structural Pattern Matching syntax. This is by far the most substantial language change Python has seen since the arrival of the walrus operator :=, and probably since the release of Python 3.

(Those not familiar with Structural Pattern Matching should probably start by reading PEP 635 which covers the motivation for this new language feature. There are separate PEPs for the full specification and a tutorial.)

Structural Pattern Matching is a big and fairly complex feature; adding it to Micropython would be a fair amount of work and add quite a bit of code to the parser. How much work and how much code are open questions, as is the question of when, if ever, it will be worth doing? The answer to this last question will depend very much on how rapidly users of CPython embrace, and come to depend on, this new feature. That said, the new syntax can save a lot of source code typing; if it gets rapid and windspread uptake it would represent a substantial incompatibility between Micropython and CPython.

Given how significant an addition this is to the Python syntax, it seems like it would be a good idea to understand the work involved and the scope of the change well before use of it in CPython becomes widespread. I am opening this issue to serve as a place to discuss those topics.

CANDIDATE · ISSUE

Unicode support and PEP 393

closedby Rosuavopened 2014-06-03updated 2014-06-28

Opening this as a discussion issue, so it can all be kept track of.

Python 3.3's str type supports the full Unicode range, with semantics defined by PEP 393 http://www.python.org/dev/peps/pep-0393/ (although some of the details there are CPython-specific). Currently, micropython pretends that strings are bytes, C-style, and will output them to a console without modification - so, for instance, a Unix console will interpret "\xC3\xBD" as U+00FD LATIN SMALL LETTER Y WITH ACUTE. (I have no idea what embedded devices do, but presumably it's ASCII-compatible or this issue would have come up long ago.)

Ideally and ultimately, micropython should support all of Unicode. The advantages to the language are huge (if you need me to elaborate, I can do so); in brief, Python 3 forces everyone to be correct. Correctness in Unicode is on par with correctness in memory management; it has some costs, but we willingly pay those costs as the price of guaranteeing that we won't leak memory or have buffer overruns.

But if that can't be done, or can't be done immediately, I'd like to see some means of catching problems before they happen; for instance, documenting that all encodings used MUST be ASCII-compatible, and raising an exception if a str has any character >127 in it.

I've had a bit of a look at objstr.c, and it seems that the character/byte equivalence is, unfortunately, endemic. Not only is the representation all byte-based, but helpers like is_ws() are defined by ASCII. (In CPython, "spam\xA0spam\u3000spam".split() == ["spam","spam","spam"], because U+00A0 and U+3000 are flagged whitespace.) This could be changed, but it will likely mean significant changes, and will almost certainly result in code size increases; although one of the beauties of PEP 393 strings is that, for ASCII-only strings (and even Latin-1 strings), the string in memory is no larger than it would be if stored as bytes (modulo the two-bit flag in the header, stating what the size is).

The most important question is, how much do other parts of the code dip into strings, and therefore how much impact will a change of internal representation have? I tried adding an arbitrary member to the structure, and it seemed to compile okay, and there don't seem to be any other files referencing the structure directly.

How do you feel about me doing up some approximation of PEP 393 into objstr.c? It'd be a fairly significant change.

PEP 634 Structural Pattern Matching (Python 3.10) support

Unicode support and PEP 393

Keyboard