QUERY · ISSUE
unix-ffi re module throws when looking at match containing optional groups
Problem description
When a regex contains optional capture groups, for example (a)?b, the PCREMatch.group() method throws an overflow error for that group. I believe this is because PCRE is representing a nonexistent group as SIZE_MAX and re isnʼt checking for that.
Additional fallout
The unix-ffi json library requires the unix-ffi re library, and currently cannot parse numbers unless they have an integer part, a fractional part, and and exponential part; instead, it throws this same error.
To reproduce
>>> import re
>>> r = re.compile(r'(a)?b')
>>> m = r.match('b')
>>> (m.group(0), m.group(1))
Expected (cpython, and micropython built-in re library):
('b', None)
Actual (micropython re library)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/micropython/re.py", line 65, in group
OverflowError: overflow converting long int to machine word
CANDIDATE · ISSUE
Regex results different from CPython
extmod
import re
m = re.match("([^\s]+)\s*([^\s]+)", "5 1000")
print("g1", m.group(1), "g2", m.group(2))
CPython (tried with 3.7 and 3.8, windows and linux): g1 5 g2 1000 which is correct as far as I can tell
MicroPython v1.17-92-gf4c1389fb (msvc and linux ports): g1 5 100 g2 0 which looks like a bug?