Data loss in string transfer through sys.stdout.write
Env: MicroPython v1.20.0 on 2023-04-26; Raspberry Pi Pico with RP2040
I have noticed constant data loss when transfer string objects with help of sys.stdout.write() method.
MCU code:
import sys
BUF_SIZE = 10000
while True:
sys.stdin.readline()
line = bytes(BUF_SIZE).hex() + '\n'
sys.stdout.write(line)
PC code:
import serial
connection = serial.Serial('COM6', timeout=3)
for i in range(1000):
connection.write(b'\n')
res = connection.readline()
print(i, len(res))
If BUF_SIZE set to 8000 everything will work fine.
In case of sending bytes:
import sys
BUF_SIZE = 40000
while True:
sys.stdin.readline()
line = bytes(BUF_SIZE)
sys.stdout.write(line)
transfer is ok even for 40000 bytes.
Make sys.stdout.buffer.write() return the number of bytes written.
For a bit of background see https://github.com/orgs/micropython/discussions/11807
MicroPython code may depend on the return value of sys.stdout.buffer.write() to reflect the number of bytes actually written. In most cases a write() succeeds, but in case it doesn't, data gets lost without any way to know it, if write() simply returns the number of bytes that should have been written.
One reason why a write may fail, is where USB is used and the receiving end doesn't read fast enough to clear out the receive buffer. In that case, write on the MicroPython side will timeout, and without this patch, part of the data is lost. This behavior was observed between a Pi Pico as client and a Linux host using the ACM driver.
This patch addresses this issue by at least moving the responsibility of supplying a return value from the core code in sys_stdio_mphal.c to the respective mp_hal_stdout_tx_strn() functions in each port. The next step is to make each port return the number of bytes actually written. For some ports, this is already implemented by this patch: where the write() system call is used, or tud_cdc_write() is used to write to USB.
A tricky problem is where mp_hal_stdout_tx_strn() has multiple outputs, e.g. USB combined with dupterm and/or hardware UART. In such cases, only successfully written bytes should probably be sent to dupterm, or else a second attempt to submit the same data will show on dupterm as duplicated data, and/or dupterm shows data that is not visible on USB.
sys.stdout.write() is a bit more difficult (but not impossible) to fix as it performs data modification (aka "cooked" output). It is kept as-is in this patch. The tradeoff made here is that developers that really care about the return value of write(), should probably use sys.stdout.buffer.write() anyway.
This patch may break some existing code, but the expectation is that it fixes existing code as well, given the fact that the return value oddness of sys.stdout.buffer.write() is not documented and probably not well known, so it is not likely that code depends on the prior (broken) behavior.