QUERY · ISSUE

Add VM hook after every opcode (MICROPY_VM_HOOK_OPCODE)

openby adritiumopened 2017-11-30updated 2024-09-29

enhancement

My OS is cooperative multitasking i.e. the context switch must be explicitly initiated via RTOS.yield() which means that in order for me to be able to integrate uPython into my RTOS, I have to find a spot to place RTOS.yield() call and I've found it.

Original code, vm.c
https://github.com/micropython/micropython/blob/cb9da2279b021ea7a916cbf16dd0e05c009c22bb/py/vm.c#L136

    #define DISPATCH() do { \
        TRACE(ip); \
        MARK_EXC_IP_GLOBAL(); \
        goto *entry_table[*ip++]; \
    } while (0)

Modified code:

    #define DISPATCH() do { \
        TRACE(ip); \
        MARK_EXC_IP_GLOBAL(); \
        RTOS.yield();\
        goto *entry_table[*ip++]; \
    } while (0)

But this is not efficient because it yields after every-single opcode and I don't need it to break after every single opcode.

The next optimization step is to say only yield every X opcodes

    #define DISPATCH() do { \
        TRACE(ip); \
        MARK_EXC_IP_GLOBAL(); \
        if(counter++ % NUM_OPS_EXC == 0) RTOS.yield();\
        goto *entry_table[*ip++]; \
    } while (0)

But I don't really care how many opcodes are executed, I care how much time is taken i.e. 5us, 50us, etc so what I'd really like to do is to have a downcounter that is decremented by an ISR and saturates (doesn't wrap around).

    #define DISPATCH() do { \
        TRACE(ip); \
        MARK_EXC_IP_GLOBAL(); \
        if(DownCounter == 0) RTOS.yield();\
        goto *entry_table[*ip++]; \
    } while (0)

In this config, total execution time is DownCounter timeout + longest opcode execution time

Since #define DISPATCH() already exists, why even create this issue? In a future refactoring of uPython, #define DISPATCH() may be removed and I'll have to find another point to "hack-in" my coop-multitasking fix and it may not be as clean and simple as today's.

I request a placeholder (construct) - macro, function callback, etc - where user-defined RTOS.yield() can be inserted.

Given today's layout, the following makes sense to me:

    #define DISPATCH() do { \
        TRACE(ip); \
        MARK_EXC_IP_GLOBAL(); \
        RTOS_YIELD();\
        goto *entry_table[*ip++]; \
    } while (0)

and if MICROPY_OPT_COMPUTED_GOTO is not defined, you'd have

#define DISPATCH() RTOS_YIELD(); break

In mpconfigport.h

   #define RTOS_YIELD()\
      if(counter++ % NUM_OPS_EXC == 0) RTOS.yield();\

and in mpconfig.h, you'd add

#ifndef RTOS_YIELD()
#define RTOS_YIELD() 
#endif

CANDIDATE · ISSUE

Add multiple hooks to garbage collection to enable cooperative multitasking

closedby adritiumopened 2017-12-07updated 2021-11-01

In #3462 it was suggested that adding a hook after every opcode execution would not "solve" determinism because the length of time of an "execute function" opcode is determined by the length of time of the C-function.

@dpgeorge said:

As @stinos mentions, even doing this won't be enough to get each opcode executing within strict time limits. For example sorting a list is a single opcode (the function/method call) and that can take an arbitrary time depending on the size of the list (eg on pyboard sorting a sorted list of 100 elements takes 2.6ms, and 1000 elements takes 249ms). Also inserting into a dict may require a rehash of the dict and then a garbage collection which can make the insertion time vary from between 30us to 1500us (approx).

And I agree but there's a difference between sort (and other utilities) and gc

You don't need sort and other functions for Python language to run. If you wanted, you could even write sort function in Python!
However, practically speaking you need a garbage collector and cannot write it in Python.
It's far simpler to write your own sort in C that has necessary RTOS.yield() hooks than it is to write a garbage collector.

My request

In gc_collect_root() and its callees, wherever there's a loop, insert a hook to allow a call to RTOS.yield()

https://github.com/micropython/micropython/blob/9ebc037eee575fd951dea92c82ed9704d9101924/py/gc.c#L315

modified:

void gc_collect_root(void **ptrs, size_t len) {
    for (size_t i = 0; i < len; i++) {
        void *ptr = ptrs[i];
        VERIFY_MARK_AND_PUSH(ptr);
        gc_drain_stack();
        GC_COLLECT_ROOT_HOOK();
    }
}

modified:

STATIC void gc_drain_stack(void) {
    while (MP_STATE_MEM(gc_sp) > MP_STATE_MEM(gc_stack)) {
        GC_DRAIN_STACK_OUTER_LOOP_HOOK();
        // pop the next block off the stack
        size_t block = *--MP_STATE_MEM(gc_sp);

        // work out number of consecutive blocks in the chain starting with this one
        size_t n_blocks = 0;
        do {
            n_blocks += 1;
            GC_DRAIN_STACK_INNER_WHILE_HOOK();
        } while (ATB_GET_KIND(block + n_blocks) == AT_TAIL);

        // check this block's children
        void **ptrs = (void**)PTR_FROM_BLOCK(block);
        for (size_t i = n_blocks * BYTES_PER_BLOCK / sizeof(void*); i > 0; i--, ptrs++) {
            void *ptr = *ptrs;
            VERIFY_MARK_AND_PUSH(ptr);
            GC_DRAIN_STACK_INNER_FOR_LOOP_HOOK();
        }
    }
}

Yes it's inefficient so most people won't use it but for those that cannot wait >600us for the gc to execute, these macros are necessary.

Optimizations

Immediately, I can see you can refactor each loop such that it executes x times (configurable) before checking whether it has to yield.
The "configuration interface" should be simple for the dev/user. If we use the macros as I've currently defined them, the user has to have knowledge of the gc code to understand how the macros will impact the timing. Instead, the config should be something like: setting GC_COOP_MULTI to 0 will result in gc not calling GC_HOOK_CHECK() at all. Each increase, up to MAX, increases frequency of calls to GC_HOOK_CHECK() proportionally

Add VM hook after every opcode (MICROPY_VM_HOOK_OPCODE)

Add multiple hooks to garbage collection to enable cooperative multitasking

Keyboard