@wohltat yes this is known see #846
aiohttp TLS websocket fails for continously sending packages on ESP32
@Carglglz
I have an application i want to send TLS / SSL packages over websockets using aiohttp on an ESP32. The problem is that the websockets fail after a short while when using packages that have a little bigger size around 2kB to 4kB.
Here is a simple test script:
import aiohttp
import asyncio
import gc
# allocate and prepare fixed block of data to be sent
b = bytearray(10_000)
for k in range(100):
p = k*100
b[p:p] = b'X'*96 + f'{k:3}' + '\n'
mv = memoryview(b)
URL = "wss://somewebsocketserver/echo"
sslctx = False
if URL.startswith("wss:"):
try:
import ssl
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
sslctx.verify_mode = ssl.CERT_NONE
except Exception:
pass
async def ws_receive(ws : ClientWebSocketResponse):
try:
async for msg in ws:
if msg.type != aiohttp.WSMsgType.TEXT:
print('type:', msg.type, repr(msg.data))
except TypeError as e:
print('ws_receive', repr(e))
async def ws_test_echo(session):
n = 1
while True:
ws_receive_task = None
async with session.ws_connect(URL, ssl=sslctx) as ws:
ws_receive_task = asyncio.create_task(ws_receive(ws))
try:
while True:
gc.collect()
print('-------------------', n, '------------------------')
await ws.send_str(b[0:100*n].decode())
n += 1
await asyncio.sleep_ms(100)
except KeyboardInterrupt:
pass
finally:
await ws.close()
async def main():
async with aiohttp.ClientSession() as session:
print('session')
await ws_test_echo(session)
if __name__ == "__main__":
asyncio.run(main())
remote
I'll get the following exception after a while. Just dealing with the exception is not satisfying since it takes a while and the applicaiton is blocked in that time.
Task exception wasn't retrieved
future: <Task> coro= <generator object 'ws_receive' at 3ffec850>
Traceback (most recent call last):
File "asyncio/core.py", line 1, in run_until_complete
File "<stdin>", line 29, in ws_receive
File "/lib/aiohttp/aiohttp_ws.py", line 226, in __anext__
File "/lib/aiohttp/aiohttp_ws.py", line 171, in receive
File "/lib/aiohttp/aiohttp_ws.py", line 198, in _read_frame
File "asyncio/stream.py", line 1, in read
OSError: -113
Traceback (most recent call last):
File "<stdin>", line 66, in <module>
File "asyncio/core.py", line 1, in run
File "asyncio/core.py", line 1, in run_until_complete
File "asyncio/core.py", line 1, in run_until_complete
File "<stdin>", line 62, in main
File "<stdin>", line 56, in ws_test_echo
File "/lib/aiohttp/aiohttp_ws.py", line 233, in close
File "/lib/aiohttp/aiohttp_ws.py", line 194, in close
File "/lib/aiohttp/aiohttp_ws.py", line 189, in send
File "asyncio/stream.py", line 1, in drain
OSError: -113
(OSError: 113 = ECONNABORTED)
Noteworthy is that there are always some packages that micropyhon thinks are sent already but don't reach the other side.
I also do not receive a websocket close package.
This was tested on a remote Websocket server that i don't control.
local
When i try it on a local server, i see different problems.
Traceback (most recent call last):
File "<stdin>", line 64, in <module>
File "asyncio/core.py", line 1, in run
File "asyncio/core.py", line 1, in run_until_complete
File "asyncio/core.py", line 1, in run_until_complete
File "<stdin>", line 60, in main
File "<stdin>", line 45, in ws_test_echo
File "/lib/aiohttp/aiohttp_ws.py", line 239, in send_str
File "/lib/aiohttp/aiohttp_ws.py", line 187, in send
File "asyncio/stream.py", line 1, in write
OSError: -104
(OSError: 104 = ECONNRESET)
The servers seems to not receive the complete message / only a corrupted message and closes the connection:
...
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 37
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 38
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 39
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
b'\xe9\x82\xc3+\xe9\x82\xbbG\x81\xd0\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3+\xe9\x82\xc3'
Unhandled exception in client_connected_cb
transport: <asyncio.sslproto._SSLProtocolTransport object at 0x7f18b780ecf0>
Traceback (most recent call last):
File "/home/username/Documents/projects/rfid/websockets_python/microdot/examples/tls/microdot/microdot.py", line 1224, in serve
await self.handle_request(reader, writer)
File "/home/username/Documents/projects/rfid/websockets_python/microdot/examples/tls/microdot/microdot.py", line 1338, in handle_request
await writer.aclose()
File "/home/username/Documents/projects/rfid/websockets_python/microdot/examples/tls/microdot/microdot.py", line 1218, in aclose
await self.wait_closed()
File "/usr/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
await self._protocol._get_close_waiter(self)
File "/usr/lib/python3.11/asyncio/sslproto.py", line 648, in _do_shutdown
self._sslobj.unwrap()
File "/usr/lib/python3.11/ssl.py", line 983, in unwrap
return self._sslobj.shutdown()
^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2706)
The closing of the websocket is also not handled properly by the micropython application / aiohttp. The close package is received but the connection is not closed automatically.
Another problem is that even if i deal with the exceptions and reconnect automatically, my application will eventually crash because of a run out of memory.
This is really problematic to me since i only send messages of size 2k or so.
I guess this is because of the not so memory economical design of aiohttp. For sending and receiving there is always a lot of new memory allocation involved.
The use of preallocation and memoryview seems reasonable here.
This is the local python websocketserver code:
import ssl
import sys
from microdot import Microdot
from microdot.websocket import with_websocket
app = Microdot()
html = '''<!DOCTYPE html>
<html>
<head>
<title>Microdot Example Page</title>
<meta charset="UTF-8">
</head>
<body>
<div>
<h1>Microdot Example Page</h1>
<p>Hello from Microdot!</p>
<p><a href="/shutdown">Click to shutdown the server</a></p>
</div>
</body>
</html>
'''
@app.route('/')
async def hello(request):
return html, 200, {'Content-Type': 'text/html'}
@app.route('/echo')
@with_websocket
async def echo(request, ws):
while True:
data = await ws.receive()
print(data)
await ws.send(data)
@app.route('/shutdown')
async def shutdown(request):
request.app.shutdown()
return 'The server is shutting down...'
ext = 'der' if sys.implementation.name == 'micropython' else 'pem'
folder_name = "/home/username/Documents/projects/rfid/websockets_python/"
ssl_cert = "cert.pem"
ssl_key = "key.pem"
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.load_cert_chain(ssl_cert, ssl_key)
app.run(port=4443, debug=True, ssl=sslctx)
MicroPython v1.23.0-preview.344.gb1ac266bb.dirty on 2024-04-29; Generic ESP32 module with ESP32
aiohttp sends Sec-WebSocket-Key with in wrong with b'' envelope
@Carglglz
During the handshake the Sec-WebSocket-Key is send undecoded which results in it being sent with an additional b'' (like b'tb4IfqY2SEcIEy0pv0opLQ==').
If the line:
https://github.com/micropython/micropython-lib/blob/583bc0da70049f3b200d03e919321ac8dbeb2eb8/python-ecosys/aiohttp/aiohttp/aiohttp_ws.py#L146
is changed to the following it doesn't cause an error anymore:
headers["Sec-WebSocket-Key"] = key.decode()
I used a python-websocket Server for testing which seems to make more sense than the other options as a servers, since it claim to be more strickly following the RFC6455 specification and offers informative error logging.
Here is the server error log of the handshake:
= connection is CONNECTING
< GET / HTTP/1.1
< Origin: https://192.168.178.123:4443
< Connection: Upgrade
< Sec-WebSocket-Key: b'tb4IfqY2SEcIEy0pv0opLQ=='
< Host: 192.168.178.123:4443
< Upgrade: websocket
< Sec-WebSocket-Version: 13
s_w_key="b'tb4IfqY2SEcIEy0pv0opLQ=='"
! invalid handshake
Traceback (most recent call last):
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/handshake.py", line 86, in check_request
raw_key = base64.b64decode(s_w_key.encode(), validate=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/base64.py", line 88, in b64decode
return binascii.a2b_base64(s, strict_mode=validate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
binascii.Error: Only base64 data is allowed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/server.py", line 167, in handler
await self.handshake(
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/server.py", line 609, in handshake
key = check_request(request_headers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/tls-test/venv/lib/python3.11/site-packages/websockets/legacy/handshake.py", line 88, in check_request
raise InvalidHeaderValue("Sec-WebSocket-Key", s_w_key) from exc
websockets.exceptions.InvalidHeaderValue: invalid Sec-WebSocket-Key header: b'tb4IfqY2SEcIEy0pv0opLQ=='
> HTTP/1.1 400 Bad Request
> Date: Tue, 07 May 2024 00:14:16 GMT
> Server: Python/3.11 websockets/12.0
> Content-Length: 102
> Content-Type: text/plain
> Connection: close
> [body] (102 bytes)
connection rejected (400 Bad Request)
x closing TCP connection
! timed out waiting for TCP close
x aborting TCP connection
= connection is CLOSED
connection closed
After the change the connection works as expected. Strangely two other servers i've tested connected despite of this bug. After the code change they also still connect as to be expected.
Here is the server code:
#!/usr/bin/env python
import asyncio
import pathlib
import ssl
import websockets
import logging
logging.basicConfig(
format="%(message)s",
level=logging.DEBUG,
)
async def hello(websocket):
while True:
msg = await websocket.recv()
print(f"{msg}")
# await websocket.send(msg)
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ssl_cert = "cert.pem"
ssl_key = "key.pem"
ssl_context.load_cert_chain(ssl_cert, ssl_key)
async def main():
async with websockets.serve(hello, "0.0.0.0", 4443, ssl=ssl_context):
await asyncio.Future() # run forever
if __name__ == "__main__":
asyncio.run(main())
Fixed by 2b0d7610cef301881fea2c1e8994227367196093
@wohltat your test does not do what you think is doing e.g.
if you add
You will see that, 1) it has a length of 20_000 and 2) there are invalid UTF bytes.
Also here
this is sending an infinite growing string, so I'm not sure what you want to test with this...
anyway here is a proper test (working on unix and esp32 ports)
and server
client
server
Hi, thanks for quick your answer.
Your are right with the size of the bytearray, but the first 10_000 bytes stay the same so i can't see how this influences the test.
I tried it again a i still have the same problems.
This was my intention. I wanted to test with varying and increasing packet sizes. Your test has the same size of 2kb for every packet which was never a problem on the local server and is ocasionally failing on the remote server (that i don't know exactly what it is doing).
If i set the
size = 5000, i can reliably make it fail on the local server right away:With increasing packet sizes the point of failure is almost always at (or very close to) 4k for the local server. Strangely the connection to the remote server fails rather randomly between 2k and 8k. But i don't know what kind of server that is.
The other problem that may be connected is that there is a memory problem, that may be caused by memery fragmentation i guess. I have no other explanation since there should be enough memory available. When i use a little memory in between the packets than the following memory error might occur. Not always though, often the connections breaks because of the before mentioned errors:
I used the following client code on the ESP32 (The server ran on my pc under linux). The list
lis to cause increase memory fragmentation:Because at some point it was sending non-UTF characters so it looked like data corruption.
Ok I see, I'm testing this in unix port at both sides, with increased sizes and at some point it "breaks" .i.e at 8000-10000 for
wssand > 16000 for plainws(Continuation frames not supported). I'm not sure if this has anything to do with theaiohttp_wsimplementation or if the "bug" is at lower level, asyncio streams maybe?This may have something to do with partial writes 🤔 ... I may need to do further testing with CPython, and also with plain asyncio streams and see if I can reproduce this...
I tried it again with a python-websockets server described in https://github.com/micropython/micropython-lib/issues/853#issue-2283450782.
The program also freezes and i get the following on the server side:
or also this
The websockets implementation is based on https://github.com/danni/uwebsockets ,so it may be interesting to test that and see if it breaks too. Using Wireshark may help 🤔, and something like iperf3 for websockets would be nice.... also there is https://github.com/micropython/micropython/issues/12819, and https://github.com/orgs/micropython/discussions/14427#discussioncomment-9337471
[EDIT] this may help too
esp.osdebug(True, esp.LOG_INFO)and I'm seeingthis
So it looks like a wifi beacon timeout...
I took a closer look in wireshark and found something i didn't expect.
Although i'm only sending from the esp32 to the pc server via websockets, there are there are a lot of TCP packets that i don't know what they are doing.
The TLS packets with increasing size are probably the ones with the payload. As one expect they increase by 100 byte every time.
The other packets are:
54 byte -> Client (to 192.168.178.64)
1494 byte -> Server (to 192.168.178.58)
54 byte -> Client
1494 byte -> Server
54 byte -> Client
then the next TLS Packet
When the connection breaks there are those TCP packets missing. This causes OSError 113 (= ECONNABORTED).

As i understand it OSError is not from micropython but the Espressif IDF. Is this maybe some kind of connection-alive-checking mechanism that could be turned of?
When watching the websockets communication with a client running locally on my pc there are not those additional TCP-packets.
I also find them rather large with around 3kb "useless" data transfer for potentially just a single byte of payload.
I've tried to increase the beacon timeout but it only takes longer to throw the error after the socket becomes unresponsive, it seems like the whole network stack stops...I've just tested the solution at https://github.com/micropython/micropython/issues/12819 but it still stops at some point...same with enabling debug/verbose mode, it stops too...well I think this may be important to read it carefully... https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/wifi.html#wi-fi-buffer-usage
@wohltat my conclusion is that after https://github.com/micropython/micropython/pull/13219 and https://github.com/micropython/micropython/pull/12141 (see e.g. https://github.com/micropython/micropython/pull/12141#issuecomment-1670418514)
this has become more relevant
see also
https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/lwip.html#lwip-performance
possible solutions:
I've spend some time researching. So far i don't have a complete solution but some insights:
It helps to change some options to get some more RAM. This helped to send significant longer messages before the connection broke
i also tried debug output (
CONFIG_LWIP_DEBUG=y) and i found that in mbedtls there is an unexpected error:'DBG:/IDF/components/mbedtls/mbedtls/library/ssl_msg.c:2145: ssl->f_send() returned -26752 (-0x6880)'
0x6880 SSL - Connection requires a write call
(https://github.com/godotengine/godot/issues/59216)
Could the have something to do with missing backpressure as mentioned in python-websockets
This would make sense since everytime the error occurs, the client send message N and the server just received message N-2 or less. So that would mean that the messages are not send and the send-buffer is filling up. But i'm not quite sure about that theory, what do you think?
i found compilation times very slow. Every time only one option is changed the whole idf has to be recompiled which takes like 3-4 minutes on my computer. Is there a way to accelerate this process?
I tried to remove bluetooth from the firmware to free up some more memory but i always get error messages? I could not find anything about how to correctly remove bt. I'm not quite sure but i think i remember that you said something about removing bt before.
I also tried compiling with IDF v5.2, which is supported according to the micropython documentation.
That turn out to not work at all so far.
MBEDTLS_ERR_MPI_ALLOC_FAILEDwhen running my application. Then i followed the steps to optimize memory for Mbed TLSGuru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled.Yes the problem is basically this:
Solution is preallocate what you really need in MicroPython and make sure it does not allocate over a certain size leaving enough memory for idf heap/wifi buffers. Also be aware that slicing e.g.
b[0:100*n]allocates memory so in your example this is going to lead to memory fragmentation and then the above mentioned "overallocation"yes but it will depend on the number of core/threads of your computer use
-j <N>option with themakecommand,$ man makeIDF v5.1.2 it's the latest I've tested which seems to work.