esp32 cannot create thread after opening ap
why esp32 cannot create thread after opening ap.
esp32/mpthreadport: Fix double delete of tasks on soft reset.
Summary
Python threads (created via the _thread module) are backed by a FreeRTOS task. Managing the deletion of the task can be tricky, and there are currently some bugs with this in the esp32 port.
The actual crash I saw was in FreeRTOS' uxListRemove(), and that's because of two calls to vTaskDelete() for the same task: one in freertos_entry() when the task ran to completion, and the other in mp_thread_deinit(). The latter tried to delete the task a second time because it was still in the linked list, because vTaskPreDeletionHook() had not yet been called. And the reason vTaskPreDeletionHook() was yet to be called is because the FreeRTOS idle task was starved.
This PR attempts to fix that.
There are two things done by this fix:
- make sure
vTaskDelete()is only called once for each task, either when the Python thread finishes, or inmp_thread_deinit()(otherwise a double free leads to memory corruption within FreeRTOS) - make
mp_thread_deinit()to wait for all remaining tasks to have theirvTaskPreDeletionHook()called (otherwisevTaskPreDeletionHookmay be called after the soft reset, when the MP GC heap no longer contains the validmp_thread_tstructs)
Testing
The bugs here were found by running the standard test suite over and over again on an ESP32S2, with ESP32_GENERIC_S2 firmware. But it was very difficult to reproduce. The following was needed to see a crash:
- ESP32S2, because it's single core (on dual core, task cleanup seems to happen much faster)
- run a test that outputs lots of data, that seems to saturate the USB CDC task and starve the idle task
- then run a test that creates a thread
- then run another test so soft reset occurs
- the timing of the host PC processing USB CDC data also has an effect! eg if the host PC is very busy then the bug may show up more easily or not
- IDF 5.4.1 showed the bug much more easily than IDF 5.2.2
I found the following test command allowed the bug to manifest:
$ cd tests
$ while true; do ./run-tests.py -t a0 extmod/framebuf16.py extmod/framebuf_ellipse.py extmod/select_poll_eintr.py || break; sleep 1s; done
With the fix in this PR the bug seems to be gone, I can no longer reproduce it. Tested on ESP32 and ESP32S2, with IDF 5.4.1.