`natmod`: allow static `.bss` variables
Description
This is an improvement request based on my experience of compiling the LoRa Basic Modem to a MicroPython Native Module
The natmod documentation says:
static BSS variables are not supported; workaround: use global BSS variables
This is widely used in existing code bases and requires a lot of patching.
However, this limitation only affects certain platforms:
┌──────┬───────┬───────┬──────────┬──────────┬─────────────┬─────────────┬──────────┬─────────────┐
│ │ x86 │ x64 │ armv6m │ armv7m │ armv7emsp │ armv7emdp │ xtensa │ xtensawin │
├──────┼───────┼───────┼──────────┼──────────┼─────────────┼─────────────┼──────────┼─────────────┤
│ test │ OK │ x │ x │ x │ x │ x │ OK │ OK │
└──────┴───────┴───────┴──────────┴──────────┴─────────────┴─────────────┴──────────┴─────────────┘
Maybe it's possible to improve mpy_ld to get rid of this limitation. This would greatly simplify things.
Implementation
I hope the MicroPython maintainers or community will implement this feature
Code of Conduct
Yes, I agree
natmod: Allow linking with static libraries
Summary
When building non-trivial native modules, the compiler runtime needs to be linked manually, which is a tedious process. This PR allows mpy_ld to automatically resolve dependencies and link with libgcc.a and libm.a (or other user-specified static libraries). It also improves reporting format of multiple definitions and unresolved symbol errors.
Also, the automatic process ensures that only the required object files from archives are being linked into the mpy file. When doing it manually, one can include unneeded obj files which will just inflate the binary.
Loading large .a files takes quite some time, so the implementation caches the parsing results to minimize the impact on the developer experience (and the planet 🌱).
MICROPY_ARCH_CFLAGS was added so it can be conveniently used in the Makefile (i.e. it can be used to cross-build third-party libs)
Testing
This was developed as part of wasm2mpy initiative.
As a result, I was able to produce builds without including the runtime object files in wasm2mpy repository:
https://github.com/vshymanskyy/wasm2mpy/actions/runs/10834266784/job/30063030978
┌────────────────┬───────┬───────┬──────────┬──────────┬─────────────┬─────────────┬──────────┬─────────────┐
│ │ x86 │ x64 │ armv6m │ armv7m │ armv7emsp │ armv7emdp │ xtensa │ xtensawin │
├────────────────┼───────┼───────┼──────────┼──────────┼─────────────┼─────────────┼──────────┼─────────────┤
│ assemblyscript │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ cpp │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ rust │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ tinygo │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ zig │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ virgil │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ wat │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
│ coremark │ 🟢 │ 🟢 │ 🟥 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟢 │
└────────────────┴───────┴───────┴──────────┴──────────┴─────────────┴─────────────┴──────────┴─────────────┘
Note: armv6m/coremark build fails due to unrelated reason1, reason2
Also, examples/natmod/features2 was updated to use this functionality so it is included in the regular MicroPython CI tests.
Also, this was successfully used by @agatti and @jonnor . See comments below.
Motivation
- Fixes #5629
- #6186
- Removes bloat and hacks from
emlearn - https://github.com/micropython/micropython/issues/10432
- https://github.com/micropython/micropython/issues/14430
- https://github.com/orgs/micropython/discussions/9640
- https://github.com/orgs/micropython/discussions/11730
- https://forums.raspberrypi.com/viewtopic.php?t=320367
- https://forum.micropython.org/search.php?keywords=%22undefined+symbol%22&terms=all&author=&sc=1&sf=all&sr=posts&sk=t&sd=d&st=0&ch=300&t=0&submit=Search
Trade-offs and Alternatives
- Alternatively, one can do it manually, i.e. find and unpack relevant
.a, resolve dependencies, include required object files in the build process mpy_ldgets a new (optional) dependency on ar package.
aris not required unless you either setLINK_RUNTIME=1, or pass-loption to thempy_ld.- I've added some basic support for weak symbols, but it should be addressed via a separate PR.