mpremote: Writing fails when using mount and using multibyte characters.
I'm on MicroPython v1.21.0 on 2023-10-06; PORTENTA with STM32H747
The following script runs fine when executing it on the board e.g. mpremote connect id:3871345 run ./examples/demo.py
with open("foo.txt", 'w', encoding='utf-8') as file:
file.write("🔢 Data" + '\n')
file.write("🔢 Data" + '\n')
with open("foo.txt", 'r', encoding='utf-8') as file:
data = file.read()
print(data)
It correctly prints
🔢 Data
🔢 Data
However, when running it after mounting the current directory, it fails. e.g. mpremote connect id:387134 mount . run ./examples/demo.py
It prints
Local directory . is mounted at /remote
ta
ta
🔢 Da🔢 Da
Please note that the ta output is created by the write() function which shouldn't produce any output to the console.
If I modify the above example to write the following data, it stalls:
file.write("🔢 Data" + '\n' + "🔢 Data")
Looks like an issue with how multi byte characters are handled when using mount.
mpremote mount: file write of non-byte arrays fails due to incorrect length calculation
Port, board and/or hardware
rp2 port on an RP2350
MicroPython version
MicroPython v1.25.0-162.gf8fe70505 on 2025-06-03; Raspberry Pi Pico2 with RP2350
(Reproduced on a variety of other versions from released v1.25.0 to current git HEAD.)
Reproduction
Writing an array the type of whose elements is longer than one byte doesn't work properly (junk is leaked to the console) if the file is mounted from the host using mpremote mount:
# mpremote mount /tmp repl
Local directory /tmp is mounted at /remote
Connected to MicroPython at /dev/ttyACM0
Use Ctrl-] or Ctrl-x to exit this shell
>>> from array import array
>>> h = array('h', 0x4041 for _ in range(50))
>>> with open('testfile', 'wb') as f:
... f.write(h)
...
A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@50
>>>
Expected behaviour
This should work fine. It works with a write to the 'real' filesystem on the device, and when writing a bytearray() or array('b') to /remote mounted over mpremote:
>>> b = array('b', 0x40 for _ in range(50))
>>> with open('testfile', 'wb') as f:
... f.write(b)
...
50
>>> h = array('h', 0x4041 for _ in range(50))
>>> with open('/not-remote', 'wb') as f:
... f.write(h)
...
100
>>>
Even f.write(bytes(h)) over mpremote mount works okay:
>>> with open('testfile2', 'wb') as f:
... f.write(bytes(h))
...
100
>>>
Note in these examples that f.write(h) correctly returns the number of bytes (100), not the number of items (50).
Observed behaviour
Junk is leaked to the interactive sessions, as shown above, and f.write(h) returns 50 (the numbers of items) not 100 (the number of bytes that should have been written).
Additional Information
I think this happens because the array (implementing the buffer protocol) is passed directly to RemoteCommand.wr_bytes() by RemoteFile.write() in tools/mpremote/mpremote/transport_serial.py. This looks like:
def wr_bytes(self, b):
self.wr_s32(len(b))
self.fout.write(b)
But in the buffer protocol which array.array correctly implements, len(b) is the length in items not the length in bytes, so the length-prefix will be wrong when the item size is larger than 1.
In our original example, the length-prefix is (int32_t) 50 then 50 16-bit words are written, for a total of 100 bytes instead of 50 bytes. The extra 50 bytes leak to the console.
Is there an easy way to get the correct byte length of a buffer from micropython without allocating or copying? I know len(bytes(b)) would work, but that copies a potentially large buffer. I'm not sure what's guaranteed to be available and what features might be optional and compiled out on some ports. Presumably we must have a defined 'available' way to get either item size or total size from python - or a way to cast to a memoryview() in bytes rather than a memoryview() in items?
Do we also have a similar issue with readinto()'s handling of the capacity of the destination buffer, where len(buf) is sent as part of CMD_READ? We might be reading to a word array, for example.
def readinto(self, buf):
c = self.cmd
c.begin(CMD_READ)
c.wr_s8(self.fd)
c.wr_s32(len(buf))
n = c.rd_bytes(buf)
c.end()
return n
Code of Conduct
Yes, I agree