mpremote mount fails reading binary file
Checks
-
I agree to follow the MicroPython Code of Conduct to ensure a safe and respectful space for everyone.
-
I've searched for existing issues matching this bug, and didn't find any.
Port, board and/or hardware
RP2, Pyboard 1.1
MicroPython version
MicroPython v1.22.0 on 2023-12-27; Raspberry Pi Pico with RP2040
Reproduction
Create a file rats15.py on the PC:
import os
fn = "delete_me"
with open(fn, "wb") as f:
f.write(b"hello\n\xde\xad\xbe\xef")
with open(fn, "rb") as f:
print(f.readline())
print(f.read(4))
os.unlink(fn)
Run mpremote
$ mpremote mount .
At the REPL issue
import rats15
Expected behaviour
>>> import rats15
b'hello\n'
b'\xde\xad\xbe\xef'
>>>
This occurs if the script is run under CPython, under the Unix build, or if run locally on a MP target.
Observed behaviour
When run as described above via mpremote mount .:
MicroPython v1.22.2 on 2024-02-22; Raspberry Pi Pico with RP2040
Type "help()" for more information.
>>> import rats15
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "rats15.py", line 7, in <module>
File "<stdin>", line 147, in readline
TypeError: unsupported types for __add__: 'str', 'bytes'
>>>
Additional Information
mpremote is V1.22.0. The fault occurs on readline(). It originally became evident accessing a pgm graphics file which contains four lines of \n terminated ASCII text followed by binary data
mpremote mount: file write of non-byte arrays fails due to incorrect length calculation
Port, board and/or hardware
rp2 port on an RP2350
MicroPython version
MicroPython v1.25.0-162.gf8fe70505 on 2025-06-03; Raspberry Pi Pico2 with RP2350
(Reproduced on a variety of other versions from released v1.25.0 to current git HEAD.)
Reproduction
Writing an array the type of whose elements is longer than one byte doesn't work properly (junk is leaked to the console) if the file is mounted from the host using mpremote mount:
# mpremote mount /tmp repl
Local directory /tmp is mounted at /remote
Connected to MicroPython at /dev/ttyACM0
Use Ctrl-] or Ctrl-x to exit this shell
>>> from array import array
>>> h = array('h', 0x4041 for _ in range(50))
>>> with open('testfile', 'wb') as f:
... f.write(h)
...
A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@A@50
>>>
Expected behaviour
This should work fine. It works with a write to the 'real' filesystem on the device, and when writing a bytearray() or array('b') to /remote mounted over mpremote:
>>> b = array('b', 0x40 for _ in range(50))
>>> with open('testfile', 'wb') as f:
... f.write(b)
...
50
>>> h = array('h', 0x4041 for _ in range(50))
>>> with open('/not-remote', 'wb') as f:
... f.write(h)
...
100
>>>
Even f.write(bytes(h)) over mpremote mount works okay:
>>> with open('testfile2', 'wb') as f:
... f.write(bytes(h))
...
100
>>>
Note in these examples that f.write(h) correctly returns the number of bytes (100), not the number of items (50).
Observed behaviour
Junk is leaked to the interactive sessions, as shown above, and f.write(h) returns 50 (the numbers of items) not 100 (the number of bytes that should have been written).
Additional Information
I think this happens because the array (implementing the buffer protocol) is passed directly to RemoteCommand.wr_bytes() by RemoteFile.write() in tools/mpremote/mpremote/transport_serial.py. This looks like:
def wr_bytes(self, b):
self.wr_s32(len(b))
self.fout.write(b)
But in the buffer protocol which array.array correctly implements, len(b) is the length in items not the length in bytes, so the length-prefix will be wrong when the item size is larger than 1.
In our original example, the length-prefix is (int32_t) 50 then 50 16-bit words are written, for a total of 100 bytes instead of 50 bytes. The extra 50 bytes leak to the console.
Is there an easy way to get the correct byte length of a buffer from micropython without allocating or copying? I know len(bytes(b)) would work, but that copies a potentially large buffer. I'm not sure what's guaranteed to be available and what features might be optional and compiled out on some ports. Presumably we must have a defined 'available' way to get either item size or total size from python - or a way to cast to a memoryview() in bytes rather than a memoryview() in items?
Do we also have a similar issue with readinto()'s handling of the capacity of the destination buffer, where len(buf) is sent as part of CMD_READ? We might be reading to a word array, for example.
def readinto(self, buf):
c = self.cmd
c.begin(CMD_READ)
c.wr_s8(self.fd)
c.wr_s32(len(buf))
n = c.rd_bytes(buf)
c.end()
return n
Code of Conduct
Yes, I agree