TLS errors on ESP-IDF 5.4 with default MICROPY_GC_INITIAL_HEAP_SIZE
Port, board and/or hardware
esp32 port
MicroPython version
➜ esp32 git:(master) git describe --dirty
v1.24.0-224-ga4ab84768
Reproduction
Making single requests to 2 different TLS endpoints (which have small payloads).
➜ esp32 git:(master) ✗ idf.py --version
ESP-IDF v5.4
...
Chip is ESP32-D0WDQ6 (revision v0.0)
Features: WiFi, BT, Dual Core, Coding Scheme None
Crystal is 40MHz
Expected behaviour
IDF 5.3 was stable without change to MICROPY_GC_INITIAL_HEAP_SIZE
➜ esp32 git:(master) idf.py --version
ESP-IDF v5.3
Observed behaviour
I was getting various intermittent lock up (including no ping responses via network to device) or OSError with TLS HTTP requests:
(-17040, 'MBEDTLS_ERR_RSA_PUBLIC_FAILED+MBEDTLS_ERR_MPI_ALLOC_FAILED')
(-29312, 'MBEDTLS_ERR_SSL_CONN_EOF')
[Errno 104] ECONNRESET
Workaround: Adjusting MICROPY_GC_INITIAL_HEAP_SIZE to (52 * 1024) (from 56) none of these errors triggered. Errors triggered if I had the value set to (54 * 1024) as well.
Additional Information
It is unstable on 5.3.2 as well, but I noted it is not in the "approved" IDF versions. I am not sure if the patch version matters, but because 5.2 and 5.2.2 are explicitly listed, it makes me think they are. Unfortunately, IDF does not have a 5.3.0, so I think micropython esp32 port just lists 5.3. May be worth indicating that supported patch versions will be explicitly listed.
it was relatively stable on 5.3.1 (not as bad as 5.3.2)
Code of Conduct
Yes, I agree
esp32: update IDF to 5.4.2
Summary
This PR updates the esp32 port to use IDF 5.4.2 (instead of 5.4.1).
The CI will now use IDF 5.4.2, and the README recommends this as the officially supported version. Downloads will also use the new version.
Thanks to @projectgus, here is a summary of the change in firmware size and RAM usage moving from 5.4.1 to 5.4.2:
| BOARD | BOARD_VARIANT | IDF Version | Binary Size | Static IRAM Size | Static DRAM Size |
|---|---|---|---|---|---|
| ESP32_GENERIC | v5.4.1 | 1656557 | 117286 | 56136 | |
| v5.4.2 | 1658709 | 116894 | 53948 | ||
| ESP32_GENERIC | D2WD | v5.4.1 | 1371197 | 111174 | 55840 |
| v5.4.2 | 1372697 | 110714 | 53616 | ||
| ESP32_GENERIC | SPIRAM | v5.4.1 | 1481400 | 116430 | 56264 |
| v5.4.2 | 1482900 | 115998 | 54068 | ||
| ESP32_GENERIC_S3 | v5.4.1 | 1635902 | 16383 | 143295 | |
| v5.4.2 | 1645006 | 16383 | 142059 | ||
| ESP32_GENERIC_S3 | SPIRAM_OCT | v5.4.1 | 1639314 | 16383 | 146727 |
| v5.4.2 | 1648446 | 16383 | 145519 |
Comparing with 5.4.1, firmware size is up by about 1.5k on ESP32 and 9k on ESP32-S3. But IRAM usage (of the IDF) is down by about 500 byte on ESP32 and DRAM usage is down by about 20k on ESP32 and 10k on ESP32-S3.
Testing
I ran the full test suite (Python, .mpy, native, hardware, BLE, WiFi) on ESP32, ESP32-S2, ESP32-S3 and ESP32-C3. I did not see any regressions.
However, I did see a change in BLE event behaviour which makes tests/multi_bluetooth/ble_mtu.py and tests/multi_bluetooth/ble_mtu_peripheral.py now fail on ESP32 with IDF 5.4.2. The change in behaviour is that MTU_EXCHANGE events can now occur before CENTRAL_CONNECT/PERIPHERAL_CONNECT events. That seems a bit strange, because the MTU exchange occurs after the connection. And looking at the timing of the events there is exactly 100ms between them, ie MTU_EXCHANGE fires and then exactly 100ms later CENTRAL_CONNECT/PERIPHERAL_CONNECT fires.
I don't know if this is a bug in (Espressif's) NimBLE, a subtle change in scheduling with still valid behaviour, an intended change, a change allowed under the BLE spec, or something else. But I doubt we can fix that (easily) on our side and so in order to move forward with updating to IDF 5.4.2 I have adjusted the relevant tests so they can pass (basically, the test just needs to wait a bit between doing the connect and doing the MTU exchange, so the other side sees the original/correct ordering of events).
I have tested the modified tests on PYBD_SF6, RPI_PICO_W, ESP32 (IDF 5.4.1 and IDF 5.4.2) and ESP32-S3 (IDF 5.4.1 and IDF 5.4.2). They pass.
Trade-offs and Alternatives
- We need to keep up with IDF releases, so there's not much alternative. And this is only a patch release update, so really shouldn't have many semantic changes.
- Instead of trying to hunt down why the BLE event ordering has changed, I just opted to tweak the test.