esp32/mpthreadport: Crashed in `vPortCleanUpTCB `
Hi,
When I create a task with freertos interface xTaskCreatePinnedToCore and pinned the task to core 0 (duel to the cpu loading) which is different with micropython's core 1, crashes may occur when task finished.
- micropython commit: 1dedb65e645f72afbd614431435208d687e16992
- idf commit: cbce221e (this is idf v5.1 and idf 5.0.2 can also reproduce this issue, but most of the time no backtrace, so i pasted the backtrace tested with idf 5.1)
Recording: *********Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0).
Core 0 register dump:
PC : 0x400992a1 PS : 0x00060335 A0 : 0x8022b6f2 A1 : 0x3ffbab40
0x400992a1: spinlock_acquire at /home/lehe/esp/esp-idf-internal/components/esp_hw_support/include/spinlock.h:112
(inlined by) xPortEnterCriticalTimeout at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:501
A2 : 0x3ffbdc2c A3 : 0xffffffff A4 : 0x0000cdcd A5 : 0xb33fffff
A6 : 0x91886a06 A7 : 0x00060323 A8 : 0x800992ad A9 : 0x3ffbab20
A10 : 0x00000000 A11 : 0xb33f5454 A12 : 0x0000abab A13 : 0xb33fffff
A14 : 0x00001f46 A15 : 0x00060b23 SAR : 0x0000000d EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x4009661c LEND : 0x40096638 LCOUNT : 0x00000000
0x4009661c: memcpy at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/machine/xtensa/memcpy.S:175
0x40096638: memcpy at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/machine/xtensa/memcpy.S:197
Backtrace: 0x4009929e:0x3ffbab40 0x4022b6ef:0x3ffbab70 0x4022a1aa:0x3ffbab90 0x400d9cf5:0x3ffbabb0 0x400daa58:0x3ffbabd0 0x400d32a2:0x3ffbabf0 0x400d3164:0x3ffbac30 0x400d5f1d:0x3ffbac70
0x4009929e: xPortEnterCriticalTimeout at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:515
0x4022b6ef: vPortEnterCritical at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/portable/xtensa/include/freertos/portmacro.h:575
(inlined by) vTaskTakeKernelLock at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/tasks.c:4073
0x4022a1aa: vEventGroupDelete at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/event_groups.c:663
0x400d9cf5: audio_element_deinit at /home/lehe/esp/esp-adf-internal/components/audio_pipeline/audio_element.c:1049
0x400daa58: audio_pipeline_deinit at /home/lehe/esp/esp-adf-internal/components/audio_pipeline/audio_pipeline.c:270 (discriminator 3)
0x400d32a2: audio_recorder_stop at audio_recorder.c:?
0x400d3164: audio_recorder_maxtime_cb at audio_recorder.c:?
0x400d5f1d: timer_process_alarm at /home/lehe/esp/esp-idf-internal/components/esp_timer/src/esp_timer.c:451
(inlined by) timer_task at /home/lehe/esp/esp-idf-internal/components/esp_timer/src/esp_timer.c:477
Core 1 register dump:
PC : 0x400991e9 PS : 0x00060735 A0 : 0x8022b235 A1 : 0x3ffbc7a0
0x400991e9: vListInsert at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/list.c:171 (discriminator 1)
A2 : 0x3ffc3eb4 A3 : 0x3ffb7dec A4 : 0x00000004 A5 : 0xb33fffff
A6 : 0x00001f48 A7 : 0x00060723 A8 : 0x3ffb7dec A9 : 0x3ffb7dec
A10 : 0x00000019 A11 : 0x00000019 A12 : 0xb33fffff A13 : 0xb33fffff
A14 : 0x00000000 A15 : 0x00060723 SAR : 0x00000020 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000
Backtrace: 0x400991e6:0x3ffbc7a0 0x4022b232:0x3ffbc7c0 0x4022a9ee:0x3ffbc7e0 0x400fbbf5:0x3ffbc820 0x400fbd8b:0x3ffbc840 0x4022acbd:0x3ffbc860 0x4022ad9d:0x3ffbc880
0x400991e6: vListInsert at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/list.c:171 (discriminator 1)
0x4022b232: vTaskPlaceOnEventList at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/tasks.c:3842
0x4022a9ee: xQueueSemaphoreTake at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/queue.c:1772
0x400fbbf5: mp_thread_mutex_lock at /home/lehe/esp/micropython/ports/esp32/mpthreadport.c:207
0x400fbd8b: vPortCleanUpTCB at /home/lehe/esp/micropython/ports/esp32/mpthreadport.c:182
0x4022acbd: prvDeleteTCB at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/tasks.c:4948
0x4022ad9d: prvCheckTasksWaitingTermination at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/tasks.c:4664
(inlined by) prvIdleTask at /home/lehe/esp/esp-idf-internal/components/freertos/FreeRTOS-Kernel/tasks.c:4321
ELF file SHA256: df43f11bea27ef89
Rebooting...
ets Jul 29 2019 12:21:46
rst:0xc (SW_CPU_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4748
ho 0 tail 12 room 4
load:0x40078000,len:14548
load:0x40080400,len:4
0x40080400: _init at ??:?
load:0x40080404,len:3380
entry 0x400805c4
['', '.frozen', '/lib', '/sdcard', '/sdcard/lib']
MicroPython v1.20.0-396-g1dedb65e6-dirty on 2023-08-28; ESP-Audio Lyart v4.3 with ESP32
Type "help()" for more information.
>>>
esp32/mpthreadport: Fix double delete of tasks on soft reset.
Summary
Python threads (created via the _thread module) are backed by a FreeRTOS task. Managing the deletion of the task can be tricky, and there are currently some bugs with this in the esp32 port.
The actual crash I saw was in FreeRTOS' uxListRemove(), and that's because of two calls to vTaskDelete() for the same task: one in freertos_entry() when the task ran to completion, and the other in mp_thread_deinit(). The latter tried to delete the task a second time because it was still in the linked list, because vTaskPreDeletionHook() had not yet been called. And the reason vTaskPreDeletionHook() was yet to be called is because the FreeRTOS idle task was starved.
This PR attempts to fix that.
There are two things done by this fix:
- make sure
vTaskDelete()is only called once for each task, either when the Python thread finishes, or inmp_thread_deinit()(otherwise a double free leads to memory corruption within FreeRTOS) - make
mp_thread_deinit()to wait for all remaining tasks to have theirvTaskPreDeletionHook()called (otherwisevTaskPreDeletionHookmay be called after the soft reset, when the MP GC heap no longer contains the validmp_thread_tstructs)
Testing
The bugs here were found by running the standard test suite over and over again on an ESP32S2, with ESP32_GENERIC_S2 firmware. But it was very difficult to reproduce. The following was needed to see a crash:
- ESP32S2, because it's single core (on dual core, task cleanup seems to happen much faster)
- run a test that outputs lots of data, that seems to saturate the USB CDC task and starve the idle task
- then run a test that creates a thread
- then run another test so soft reset occurs
- the timing of the host PC processing USB CDC data also has an effect! eg if the host PC is very busy then the bug may show up more easily or not
- IDF 5.4.1 showed the bug much more easily than IDF 5.2.2
I found the following test command allowed the bug to manifest:
$ cd tests
$ while true; do ./run-tests.py -t a0 extmod/framebuf16.py extmod/framebuf_ellipse.py extmod/select_poll_eintr.py || break; sleep 1s; done
With the fix in this PR the bug seems to be gone, I can no longer reproduce it. Tested on ESP32 and ESP32S2, with IDF 5.4.1.