mpy-tool does not properly collect module names for imports
Consider an application that has the following module file:
foo/bar/quux.py
Put an import statement in some other file:
from foo.bar import quux
When freezing the module with mpy-tool.py, the names foo.bar and quux are collected. However:
foo.bar.quuxis not collected, which will be created as a qstr at runtime, becausesys.modulesuses it as a key- neither
foonorbaris collected.barwill be created as a qstr at runtime, because the dict offooneeds to insert it as a key
Of course, if we use relative imports, there's a similar problem: in foo/bar/baz.py, from . import quux should (but can't really know to) also collect foo.bar.quux despite the string not existing anywhere in the file.
I'm not sure if this is something to solve in mpy-tool itself, or whether there should be a separate step that collects symbols in this way. But it seems that using file names to generate both the fully qualified module name and the individual components would be the right thing to do. (that is basically the workaround i'm using: generate all_modules.py that walk the filesystem and convert every py file name to import that.file.name, which collects "that.file.name", and that.file.name which collects "that"", "file" and "name")
tools/mpy-tool.py: Clean up escaped compiled module name.
Summary
This PR lets "mpy-tool.py" filter unwanted characters that may end up in the compiled module name, even after the first encoding pass.
Certain characters are already ASCII-safe but cannot appear in a C identifier (eg. dots, dashes, etc.). Those characters, when appearing in a module name, won't be escaped and will be placed in the output C source code template when freezing files - leading to an invalid source file.
Now the code, after the first character encoding pass, will pre-process the compiled module name to use when filling out the C source code template. Every character that is not expected to be part of a C identifier will be replaced with an underscore.
This closes #3445.
Testing
A Python file was frozen with a custom module name containing characters that cannot be part of a C identifier, and the generated C file was compiled to make sure all generated identifiers are compliant:
$ cd ports/unix
$ make
$ printf "class Hello:\n def __init__(self):\n pass\n" > test.py
$ ../../mpy-cross/build/mpy-cross -o test.mpy -s "test.mpy" test.py
$ ../../tools/mpy-tool.py -f -q build-standard/genhdr/qstrdefs.preprocessed.h test.mpy > frozen.c
$ gcc -I. -I../.. -Ivariants/standard -Ibuild-standard -DMPZ_DIG_SIZE=16 -c frozen.c
$ size frozen.o
text data bss dec hex filename
155 176 0 331 14b frozen.o
Trade-offs and Alternatives
This PR does not help recovering from cases where the module gets named with incompatible names (eg. starting with a digit), but that wouldn't have worked in the first place.
Generative AI
I did not use generative AI tools when creating this PR.