QUERY · ISSUE

mpy-tool does not properly collect module names for imports

openby matejcikopened 2021-04-19updated 2024-09-13

needs-info

Consider an application that has the following module file:
foo/bar/quux.py

Put an import statement in some other file:

from foo.bar import quux

When freezing the module with mpy-tool.py, the names foo.bar and quux are collected. However:

foo.bar.quux is not collected, which will be created as a qstr at runtime, because sys.modules uses it as a key
neither foo nor bar is collected. bar will be created as a qstr at runtime, because the dict of foo needs to insert it as a key

Of course, if we use relative imports, there's a similar problem: in foo/bar/baz.py, from . import quux should (but can't really know to) also collect foo.bar.quux despite the string not existing anywhere in the file.

I'm not sure if this is something to solve in mpy-tool itself, or whether there should be a separate step that collects symbols in this way. But it seems that using file names to generate both the fully qualified module name and the individual components would be the right thing to do. (that is basically the workaround i'm using: generate all_modules.py that walk the filesystem and convert every py file name to import that.file.name, which collects "that.file.name", and that.file.name which collects "that"", "file" and "name")

CANDIDATE · PULL REQUEST

tools/mpy-tool.py: Clean up escaped compiled module name.

openby agattiopened 2026-03-14updated 2026-03-16

tools

Summary

This PR lets "mpy-tool.py" filter unwanted characters that may end up in the compiled module name, even after the first encoding pass.

Certain characters are already ASCII-safe but cannot appear in a C identifier (eg. dots, dashes, etc.). Those characters, when appearing in a module name, won't be escaped and will be placed in the output C source code template when freezing files - leading to an invalid source file.

Now the code, after the first character encoding pass, will pre-process the compiled module name to use when filling out the C source code template. Every character that is not expected to be part of a C identifier will be replaced with an underscore.

This closes #3445.

Testing

A Python file was frozen with a custom module name containing characters that cannot be part of a C identifier, and the generated C file was compiled to make sure all generated identifiers are compliant:

$ cd ports/unix
$ make
$ printf "class Hello:\n  def __init__(self):\n    pass\n" > test.py
$ ../../mpy-cross/build/mpy-cross -o test.mpy -s "test.mpy" test.py
$ ../../tools/mpy-tool.py -f -q build-standard/genhdr/qstrdefs.preprocessed.h test.mpy > frozen.c
$ gcc -I. -I../.. -Ivariants/standard -Ibuild-standard -DMPZ_DIG_SIZE=16 -c frozen.c
$ size frozen.o
   text    data     bss     dec     hex filename
    155     176       0     331     14b frozen.o

Trade-offs and Alternatives

This PR does not help recovering from cases where the module gets named with incompatible names (eg. starting with a digit), but that wouldn't have worked in the first place.

Generative AI

I did not use generative AI tools when creating this PR.