search() on a compiled regular expression ignores the "pos" argument
With MicroPython v1.16 (on 2021-06-18; Raspberry Pi Pico with RP2040), I observe the following unexpected output from .search() with a pos argument on a compiled regular expression:
>>> import re
>>> RE = re.compile("a")
>>> print(RE.search("ab", 1) is None)
False # Should be True
Same result on MicroPython v1.12-1 on 2020-02-09; linux version. Contrast with Python 3.8.10:
>>> import re
>>> RE = re.compile("a")
>>> print(RE.search("ab", 1) is None)
True
It seems that the pos parameter is ignored in MicroPython?
>>> print(RE.search("aa", 1).span())
(0, 1) # Should be (1, 2)
Assertion error in re1.5
Build the Unix port with DEBUG=1, then try to match a particular regular expression.
src/micropython/ports/unix$ ./micropython
MicroPython v1.14-122-g9fef1c0bd-dirty on 2021-03-27; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import ure as re
>>> re.match(63*"a" + "|", "x")
micropython: ../../extmod/re1.5/recursiveloop.c:79: recursiveloop: Assertion `!"recursiveloop"' failed.
Aborted
The original regular expression I encountered this with was a long series of "literal|literal|..." intended to be useful, the above is a minimized version.
I'm guessing it has to do with the offset of a split being large enough that it becomes negative .. with 62 repetitions of 'a', everything is 'okay'.
I think this has to do with overflowing some 8-bit offset. When additionally defining MICROPY_PY_URE_DEBUG, the not-quite-crashing expression's disassembly can be seen to begin:
>>> re.compile(62*"a" + "|", re.DEBUG)
0: rsplit 5 (3)
2: any
3: jmp 0 (-5)
5: save 0
7: split 135 (126)
9: char a
while the crashing version has a negative split target at instruction 7:
>>> re.compile(63*"a" + "|", re.DEBUG)
0: rsplit 5 (3)
2: any
3: jmp 0 (-5)
5: save 0
7: split -119 (-128)
9: char a
Knowing all this, it looks like my regular expression is simply too much for re1.5. However, maybe the problem could be found at compile time and changed to result in an exception rather than an assertion error (or, probably, unpredictable behavior at runtime if assertions are disabled)