MicroPython assumes valid prelude and bytecode in .mpy files
We're running MicroPython as a task in an embedded product, and feel it's relatively safe to run Python code in this sandboxed environment. It accesses the file system in a restricted context, and Python code shouldn't be able to access data outside of the MicroPython heap and data structures embedded in the firmware.
We're looking at supporting .mpy files in this environment. It seems safe to allow .mpy files created on the device and stored such that the user cannot modify the contents. We'd also like to support use of mpy-cross to compile files that require a larger heap.
But we're concerned that users could modify the .mpy file in ways that would (for example) allow for reading any memory address or overwriting areas of RAM outside of the MicroPython heap or its task's stack.
I've been looking at file contents outside of the actual bytecode to begin with, and would like to implement some sanity checks on some values. For example, n_def_pos_args must be <= n_pos_args. And it looks like n_state should be at least n_pos_args + n_kwonly_args + 1.
Is it possible to calculate a value for n_exc_stack by doing a validation pass on the bytecode? Or even a sanity check on the three _args settings (ensure the bytecode doesn't reference an arg index beyond what's configured)? Are there other checks we could perform?
I feel that it's better to add this burden to the import phase and reject invalid .mpy files instead of adding range checks to the vm.
We plan to implement these behind a MICROPY_ configuration macro and eventually submit a PR. Open to recommendations on a name for that macro.
Add ability to have frozen bytecode
This PR brings proper frozen bytecode. Bytecode is compiled using a "MicroPython cross compiler", and then frozen using tools/mpytool.py. The output file is then a .c file that's compiled into your uPy binary. Then "import frozenmod" works as expected. The resulting frozen module requires zero RAM to compile (but does require about 1 GC block to hold the function object, although that could eventually be optimised away).
The main changes in the PR are:
- support in core for frozen modules (including import and adding a new qstr pool for the frozen qstrs)
- addition of top-level dir upy-compiler/ which contains an example cross compiler (that's real-world usable, just needs editing of mpconfigport.h to suit a given target)
- minor modifications to minimal port to include an example of frozen bytecode (.mpy file is included so you don't need the cross-compiler to build it)
One thing to note is that MICROPY_MODULE_FROZEN has changed meaning, and all old uses of this macro should (and are in this PR) changed to MICROPY_MODULE_FROZEN_STR. There is now also MICROPY_MODULE_FROZEN_MPY to support both original frozen strings, and new frozen bytecode/mpy files.
Question: should upy-compiler be included? I think so. What should the dir be called? What should the executable be called (I just made it "micropython").