mpremote: Writing fails when using mount and using multibyte characters.
I'm on MicroPython v1.21.0 on 2023-10-06; PORTENTA with STM32H747
The following script runs fine when executing it on the board e.g. mpremote connect id:3871345 run ./examples/demo.py
with open("foo.txt", 'w', encoding='utf-8') as file:
file.write("🔢 Data" + '\n')
file.write("🔢 Data" + '\n')
with open("foo.txt", 'r', encoding='utf-8') as file:
data = file.read()
print(data)
It correctly prints
🔢 Data
🔢 Data
However, when running it after mounting the current directory, it fails. e.g. mpremote connect id:387134 mount . run ./examples/demo.py
It prints
Local directory . is mounted at /remote
ta
ta
🔢 Da🔢 Da
Please note that the ta output is created by the write() function which shouldn't produce any output to the console.
If I modify the above example to write the following data, it stalls:
file.write("🔢 Data" + '\n' + "🔢 Data")
Looks like an issue with how multi byte characters are handled when using mount.
Fix multiple unicode issues in mpremote.
Summary
This pull request addresses multiple issues related to the handling of special characters and Unicode in the mpremote tool. It fixes the escaping of quotes in filenames, ensures proper parsing of filenames containing equals signs, and resolves Unicode encoding errors on Windows consoles. These improvements enhance the usability of mpremote when dealing with diverse file names and character sets.
- Unicode-safe Windows console output: Detects modern consoles, sets UTF-8 code pages, wraps stdout/stderr when needed, and uses raw UTF-8 writes when possible; legacy consoles now handle split UTF-8 sequences safely.
- Robust stdout handling: Buffers partial UTF-8 sequences and strips CTRL-D without losing characters, improving REPL/output correctness for multibyte text.
- Safer path quoting: Filesystem commands now use repr-based quoting so filenames with quotes, backslashes, or Unicode work correctly (including equals-sign parsing).
- CLI parsing fix: Command expansion no longer misinterprets arguments containing =, preventing unexpected-argument errors.
- Transport write safety: Converts strings to UTF-8 bytes before writing to avoid encoding errors when writing unicode content to a host folder using
mpremote mount <folder>
Fixes: #13055
Fixes: #15228
Fixes: #18658
Fixes: #18657
</p>
</details>
Testing
- New test support: Adds ramdisk helper, enhanced test runner (device selection, skip handling), and Unicode/special-character coverage in the mpremote test suite.
- New unicode tests have been added to the mpremote test suite, covering the scenarios mentioned in the issues.
It should be noted that the current CI setup does not provide for Windows testing, so manual verification has been essential for confirming the fixes.
Manual testing was conducted on Windows pwsh , cmd.exe, MinGW and Linux in WSL2.
<img width="1468" height="1306" alt="image" src="https://github.com/user-attachments/assets/bb0ce2a3-4b83-40dc-bd8d-fb7b622e3863" />
Manual testing was needed due to the lack of Windows support in the bash based testing framework, ensuring that the fixes work as intended across different environments.
Also tests for the mpremote REPL and Console cannot be not covered by the bash test suite.
I started work of a pytest configuration for mpremote to adds the ability to test the REPL and Console, parts that the bash suite is unable to test at all.
That will be submitted in a separate PR once stable across multiple platforms.
Trade-offs and Alternatives
The changes made do not introduce significant trade-offs. The improvements in Unicode handling may slightly increase the complexity of the code, but they are necessary for robust functionality. Alternative approaches were considered, such as using different quoting mechanisms, but the current solutions provide the best balance of compatibility and simplicity.