QUERY · ISSUE

Add means for accessing locals when MicroPython is built with sys.settrace

openby aivarannamaaopened 2021-01-26updated 2024-09-13

enhancement

sys.settrace is a great step towards having a usable MicroPython debugger, but as long as it's not possible to query local variables, people can't replace their print-debugging with a debugger based on sys.settrace.

My proposal is to somehow expose locals when sys.settrace is enabled. It doesn't need to be a Dict[str, Any]. If the exact rules are published for mapping local names to positions in an array, the debugger can still present a conventional view to the user. It's alright if some locals (eg. created by constant expressions) don't end up in this array at all, as long as the rules are clear.

If something like this gets done, then I'd do my best to create a beginner-friendly MicroPython debugger for Thonny IDE.

CANDIDATE · PULL REQUEST

py: Add a new dual-VM option to improve performance of `sys.settrace`

openby dpgeorgeopened 2025-12-15updated 2026-03-17

port-webassemblypy-core

Summary

MicroPython supports the standard sys.settrace() facility, to trace bytecode execution. That feature is enabled through MICROPY_PY_SYS_SETTRACE and is disabled by default because it really slows the VM down, and increases memory use in many places due to the overhead of tracking the state of functions.

This PR adds a new option MICROPY_PY_SYS_SETTRACE_DUAL_VM that, when enabled, duplicates the VM, the mp_execute_bytecode() function:

One version is called mp_execute_bytecode_standard() and has settrace disabled.
The other version is called mp_execute_bytecode_tracing() and has settrace enabled.

Then a wrapper function mp_execute_bytecode() calls into the appropriate VM depending on whether the user has enabled a settrace callback or not.

The result is:

An increase of about 5k in firmware size (on ARM Cortex-M) when enabling the dual-VM (that's on top of having settrace already enabled).
Almost no performance is lost if a settrace callback is not registered.
Full support for settrace if needed.

This is useful for, eg, webassembly where PyScript wants to have sys.settrace enabled, and can afford to have a dual-VM in order not to lose performance with settrace enabled.

This PR also includes some minor fixes for the webassembly port to be able to run the performance benchmarks on that port.

Testing

Tested by running the standard test suite on the unix and webassembly ports, which also have dual-VM enabled in CI.

On the webassembly port, the perf_bench/misc_aes.py performance test runs 200 times faster (!) with dual-VM enabled, compared to just enabling settrace without the dual-VM. So it's a very big win there.

On PYBV10 the metrics are:

code size increases by 14.5k with MICROPY_PY_SYS_SETTRACE enabled
code size increases by an extra 5.3k with MICROPY_PY_SYS_SETTRACE_DUAL_VM enabled as well
performance is roughly halved when enabling settrace
performance is pretty much all regained when enabling dual-VM

The performance measurements on PYBV10 in detail

Enabling MICROPY_PY_SYS_SETTRACE:

diff of scores (higher is better)
N=100 M=100                perf-pyb-std -> perf-pyb-settrace         diff      diff% (error%)
bm_chaos.py                    356.93 ->     261.95 :     -94.98 = -26.610% (+/-0.01%)
bm_fannkuch.py                  72.90 ->      61.28 :     -11.62 = -15.940% (+/-0.00%)
bm_fft.py                     2331.03 ->    1681.29 :    -649.74 = -27.874% (+/-0.02%)
bm_hexiom.py                    45.75 ->      13.35 :     -32.40 = -70.820% (+/-0.00%)
bm_nqueens.py                 4111.33 ->    2695.70 :   -1415.63 = -34.432% (+/-0.00%)
bm_pidigits.py                 663.16 ->     606.83 :     -56.33 =  -8.494% (+/-0.02%)
bm_wordcount.py                 47.36 ->      39.05 :      -8.31 = -17.546% (+/-0.01%)
core_import_mpy_multi.py       640.86 ->     483.22 :    -157.64 = -24.598% (+/-0.01%)
core_import_mpy_single.py      105.97 ->      84.55 :     -21.42 = -20.213% (+/-0.01%)
core_locals.py                  40.28 ->      32.41 :      -7.87 = -19.538% (+/-0.00%)
core_qstr.py                   204.36 ->     192.74 :     -11.62 =  -5.686% (+/-0.00%)
core_str.py                     25.86 ->      22.55 :      -3.31 = -12.800% (+/-0.00%)
core_yield_from.py             355.61 ->       3.84 :    -351.77 = -98.920% (+/-0.00%)
misc_aes.py                    407.85 ->      32.12 :    -375.73 = -92.125% (+/-0.01%)
misc_mandel.py                3166.90 ->    2791.44 :    -375.46 = -11.856% (+/-0.01%)
misc_pystone.py               2378.74 ->     279.43 :   -2099.31 = -88.253% (+/-0.00%)
misc_raytrace.py               361.02 ->     256.57 :    -104.45 = -28.932% (+/-0.01%)
viper_call0.py                 499.49 ->     537.73 :     +38.24 =  +7.656% (+/-0.00%)
viper_call1a.py                522.73 ->     514.72 :      -8.01 =  -1.532% (+/-0.01%)
viper_call1b.py                379.77 ->     385.81 :      +6.04 =  +1.590% (+/-0.00%)
viper_call1c.py                384.11 ->     389.38 :      +5.27 =  +1.372% (+/-0.01%)
viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.00%)
viper_call2b.py                335.06 ->     341.81 :      +6.75 =  +2.015% (+/-0.01%)

Tests that are heavy on the VM (misc_aes.py and misc_pystone.py) run at about half the speed with settrace enabled.

Enabling both MICROPY_PY_SYS_SETTRACE and MICROPY_PY_SYS_SETTRACE_DUAL_VM (comparing to settrace completely disabled):

diff of scores (higher is better)
N=100 M=100                perf-pyb-std -> perf-pyb-settrace-dual-vm         diff      diff% (error%)
bm_chaos.py                    356.93 ->     353.71 :      -3.22 =  -0.902% (+/-0.01%)
bm_fannkuch.py                  72.90 ->      72.78 :      -0.12 =  -0.165% (+/-0.00%)
bm_fft.py                     2331.03 ->    2320.26 :     -10.77 =  -0.462% (+/-0.00%)
bm_float.py                   5713.01 ->    5607.72 :    -105.29 =  -1.843% (+/-0.02%)
bm_hexiom.py                    45.75 ->      45.06 :      -0.69 =  -1.508% (+/-0.00%)
bm_nqueens.py                 4111.33 ->    4035.90 :     -75.43 =  -1.835% (+/-0.00%)
bm_pidigits.py                 663.16 ->     639.93 :     -23.23 =  -3.503% (+/-0.02%)
bm_wordcount.py                 47.36 ->      46.71 :      -0.65 =  -1.372% (+/-0.00%)
core_import_mpy_multi.py       640.86 ->     608.33 :     -32.53 =  -5.076% (+/-0.01%)
core_import_mpy_single.py      105.97 ->     101.29 :      -4.68 =  -4.416% (+/-0.01%)
core_locals.py                  40.28 ->      39.92 :      -0.36 =  -0.894% (+/-0.00%)
core_qstr.py                   204.36 ->     202.55 :      -1.81 =  -0.886% (+/-0.03%)
core_str.py                     25.86 ->      25.71 :      -0.15 =  -0.580% (+/-0.00%)
core_yield_from.py             355.61 ->     328.12 :     -27.49 =  -7.730% (+/-0.01%)
misc_aes.py                    407.85 ->     398.19 :      -9.66 =  -2.369% (+/-0.01%)
misc_mandel.py                3166.90 ->    3148.39 :     -18.51 =  -0.584% (+/-0.01%)
misc_pystone.py               2378.74 ->    2336.25 :     -42.49 =  -1.786% (+/-0.00%)
misc_raytrace.py               361.02 ->     357.93 :      -3.09 =  -0.856% (+/-0.01%)
viper_call0.py                 499.49 ->     495.08 :      -4.41 =  -0.883% (+/-0.01%)
viper_call1a.py                522.73 ->     514.77 :      -7.96 =  -1.523% (+/-0.00%)
viper_call1b.py                379.77 ->     374.68 :      -5.09 =  -1.340% (+/-0.01%)
viper_call1c.py                384.11 ->     378.07 :      -6.04 =  -1.572% (+/-0.00%)
viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.01%)
viper_call2b.py                335.06 ->     332.42 :      -2.64 =  -0.788% (+/-0.00%)

Performance is almost all regained with dual-VM enabled.

Trade-offs and Alternatives

This obviously duplicates the VM, but I tried to make it as simple as possible in the implementation so that it's easy to maintain. In the future this mechanism could potentially be extended to optimise for other things, although I can't think of any obvious candidates for that.

An alternative to having a dual-VM could be to try and optimise the internal VM macros FRAME_UPDATE and TRACE_TICK. Although that could save code size, no matter how well these macros are optimised it won't be able to reach the performance of a dual-VM architecture.

I'm not sure exactly how CPython implements efficient sys.settrace these days (they've changed a lot recently with their VM), but could look there for alternative ideas as well.