← index #7372PR #18571
Related · medium · value 3.823
QUERY · ISSUE

py: settrace is incompatible with Interrupt Handlers

openby amirgonopened 2021-06-08updated 2021-06-09
py-core

When MICROPY_PY_SYS_SETTRACE is enabled, bytecode execution allocates RAM, which is forbidden on interrupt handlers.

https://github.com/micropython/micropython/blob/4d9e657f0ee881f4a41093ab89ec91d03613744d/py/vm.c#L142-L149

https://github.com/micropython/micropython/blob/e9e9c76ddf131f8e50b0ae2d44501d3cd88537ef/py/profile.c#L315-L318

CANDIDATE · PULL REQUEST

py: Add a new dual-VM option to improve performance of `sys.settrace`

openby dpgeorgeopened 2025-12-15updated 2026-03-17
port-webassemblypy-core

Summary

MicroPython supports the standard sys.settrace() facility, to trace bytecode execution. That feature is enabled through MICROPY_PY_SYS_SETTRACE and is disabled by default because it really slows the VM down, and increases memory use in many places due to the overhead of tracking the state of functions.

This PR adds a new option MICROPY_PY_SYS_SETTRACE_DUAL_VM that, when enabled, duplicates the VM, the mp_execute_bytecode() function:

  1. One version is called mp_execute_bytecode_standard() and has settrace disabled.
  2. The other version is called mp_execute_bytecode_tracing() and has settrace enabled.

Then a wrapper function mp_execute_bytecode() calls into the appropriate VM depending on whether the user has enabled a settrace callback or not.

The result is:

  1. An increase of about 5k in firmware size (on ARM Cortex-M) when enabling the dual-VM (that's on top of having settrace already enabled).
  2. Almost no performance is lost if a settrace callback is not registered.
  3. Full support for settrace if needed.

This is useful for, eg, webassembly where PyScript wants to have sys.settrace enabled, and can afford to have a dual-VM in order not to lose performance with settrace enabled.

This PR also includes some minor fixes for the webassembly port to be able to run the performance benchmarks on that port.

Testing

Tested by running the standard test suite on the unix and webassembly ports, which also have dual-VM enabled in CI.

On the webassembly port, the perf_bench/misc_aes.py performance test runs 200 times faster (!) with dual-VM enabled, compared to just enabling settrace without the dual-VM. So it's a very big win there.

On PYBV10 the metrics are:

  • code size increases by 14.5k with MICROPY_PY_SYS_SETTRACE enabled
  • code size increases by an extra 5.3k with MICROPY_PY_SYS_SETTRACE_DUAL_VM enabled as well
  • performance is roughly halved when enabling settrace
  • performance is pretty much all regained when enabling dual-VM

The performance measurements on PYBV10 in detail

Enabling MICROPY_PY_SYS_SETTRACE:

diff of scores (higher is better)
N=100 M=100                perf-pyb-std -> perf-pyb-settrace         diff      diff% (error%)
bm_chaos.py                    356.93 ->     261.95 :     -94.98 = -26.610% (+/-0.01%)
bm_fannkuch.py                  72.90 ->      61.28 :     -11.62 = -15.940% (+/-0.00%)
bm_fft.py                     2331.03 ->    1681.29 :    -649.74 = -27.874% (+/-0.02%)
bm_hexiom.py                    45.75 ->      13.35 :     -32.40 = -70.820% (+/-0.00%)
bm_nqueens.py                 4111.33 ->    2695.70 :   -1415.63 = -34.432% (+/-0.00%)
bm_pidigits.py                 663.16 ->     606.83 :     -56.33 =  -8.494% (+/-0.02%)
bm_wordcount.py                 47.36 ->      39.05 :      -8.31 = -17.546% (+/-0.01%)
core_import_mpy_multi.py       640.86 ->     483.22 :    -157.64 = -24.598% (+/-0.01%)
core_import_mpy_single.py      105.97 ->      84.55 :     -21.42 = -20.213% (+/-0.01%)
core_locals.py                  40.28 ->      32.41 :      -7.87 = -19.538% (+/-0.00%)
core_qstr.py                   204.36 ->     192.74 :     -11.62 =  -5.686% (+/-0.00%)
core_str.py                     25.86 ->      22.55 :      -3.31 = -12.800% (+/-0.00%)
core_yield_from.py             355.61 ->       3.84 :    -351.77 = -98.920% (+/-0.00%)
misc_aes.py                    407.85 ->      32.12 :    -375.73 = -92.125% (+/-0.01%)
misc_mandel.py                3166.90 ->    2791.44 :    -375.46 = -11.856% (+/-0.01%)
misc_pystone.py               2378.74 ->     279.43 :   -2099.31 = -88.253% (+/-0.00%)
misc_raytrace.py               361.02 ->     256.57 :    -104.45 = -28.932% (+/-0.01%)
viper_call0.py                 499.49 ->     537.73 :     +38.24 =  +7.656% (+/-0.00%)
viper_call1a.py                522.73 ->     514.72 :      -8.01 =  -1.532% (+/-0.01%)
viper_call1b.py                379.77 ->     385.81 :      +6.04 =  +1.590% (+/-0.00%)
viper_call1c.py                384.11 ->     389.38 :      +5.27 =  +1.372% (+/-0.01%)
viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.00%)
viper_call2b.py                335.06 ->     341.81 :      +6.75 =  +2.015% (+/-0.01%)

Tests that are heavy on the VM (misc_aes.py and misc_pystone.py) run at about half the speed with settrace enabled.

Enabling both MICROPY_PY_SYS_SETTRACE and MICROPY_PY_SYS_SETTRACE_DUAL_VM (comparing to settrace completely disabled):

diff of scores (higher is better)
N=100 M=100                perf-pyb-std -> perf-pyb-settrace-dual-vm         diff      diff% (error%)
bm_chaos.py                    356.93 ->     353.71 :      -3.22 =  -0.902% (+/-0.01%)
bm_fannkuch.py                  72.90 ->      72.78 :      -0.12 =  -0.165% (+/-0.00%)
bm_fft.py                     2331.03 ->    2320.26 :     -10.77 =  -0.462% (+/-0.00%)
bm_float.py                   5713.01 ->    5607.72 :    -105.29 =  -1.843% (+/-0.02%)
bm_hexiom.py                    45.75 ->      45.06 :      -0.69 =  -1.508% (+/-0.00%)
bm_nqueens.py                 4111.33 ->    4035.90 :     -75.43 =  -1.835% (+/-0.00%)
bm_pidigits.py                 663.16 ->     639.93 :     -23.23 =  -3.503% (+/-0.02%)
bm_wordcount.py                 47.36 ->      46.71 :      -0.65 =  -1.372% (+/-0.00%)
core_import_mpy_multi.py       640.86 ->     608.33 :     -32.53 =  -5.076% (+/-0.01%)
core_import_mpy_single.py      105.97 ->     101.29 :      -4.68 =  -4.416% (+/-0.01%)
core_locals.py                  40.28 ->      39.92 :      -0.36 =  -0.894% (+/-0.00%)
core_qstr.py                   204.36 ->     202.55 :      -1.81 =  -0.886% (+/-0.03%)
core_str.py                     25.86 ->      25.71 :      -0.15 =  -0.580% (+/-0.00%)
core_yield_from.py             355.61 ->     328.12 :     -27.49 =  -7.730% (+/-0.01%)
misc_aes.py                    407.85 ->     398.19 :      -9.66 =  -2.369% (+/-0.01%)
misc_mandel.py                3166.90 ->    3148.39 :     -18.51 =  -0.584% (+/-0.01%)
misc_pystone.py               2378.74 ->    2336.25 :     -42.49 =  -1.786% (+/-0.00%)
misc_raytrace.py               361.02 ->     357.93 :      -3.09 =  -0.856% (+/-0.01%)
viper_call0.py                 499.49 ->     495.08 :      -4.41 =  -0.883% (+/-0.01%)
viper_call1a.py                522.73 ->     514.77 :      -7.96 =  -1.523% (+/-0.00%)
viper_call1b.py                379.77 ->     374.68 :      -5.09 =  -1.340% (+/-0.01%)
viper_call1c.py                384.11 ->     378.07 :      -6.04 =  -1.572% (+/-0.00%)
viper_call2a.py                510.03 ->     502.42 :      -7.61 =  -1.492% (+/-0.01%)
viper_call2b.py                335.06 ->     332.42 :      -2.64 =  -0.788% (+/-0.00%)

Performance is almost all regained with dual-VM enabled.

Trade-offs and Alternatives

This obviously duplicates the VM, but I tried to make it as simple as possible in the implementation so that it's easy to maintain. In the future this mechanism could potentially be extended to optimise for other things, although I can't think of any obvious candidates for that.

An alternative to having a dual-VM could be to try and optimise the internal VM macros FRAME_UPDATE and TRACE_TICK. Although that could save code size, no matter how well these macros are optimised it won't be able to reach the performance of a dual-VM architecture.

I'm not sure exactly how CPython implements efficient sys.settrace these days (they've changed a lot recently with their VM), but could look there for alternative ideas as well.

Keyboard

j / / n
next pair
k / / p
previous pair
1 / / h
show query pane
2 / / l
show candidate pane
c
copy suggested comment
r
toggle reasoning
g i
go to index
?
show this help
esc
close overlays

press ? or esc to close

copied