Emitted code epilogue block contains a superflous jump to the next available code address.
Port, board and/or hardware
mpy-cross built from git commit 6db29978ac8954f3686f9eb59dd71b55c3495456 (current master)
MicroPython version
MicroPython v1.25.0-preview.216.g6db29978a.dirty on 2025-01-17; mpy-cross emitting mpy v6.3
Reproduction
- Run
mpy-cross -X emit=native -march=debug tests/basics/0prelim.py - Look at the last part of the output for a jump to
label_0.
This applies to any supported architecture with a native emitter, but -march=debug is used in the command line to make the problem visible without using a disassembler.
Expected behaviour
There should be no empty jump being emitted at the end of a block.
Observed behaviour
At the end of an emitted block of code this can be seen (taken from tests/basics/0prelim.py):
...
jump(label_0)
label(label_0)
EXIT(0)
This applies also to functions that need a more involved cleanup procedure:
(tests/basics/async_for.py):
...
jump(label_0)
dead_code load(r_temp0, r_fun_table, 0)
dead_code store(r_temp0, r_local2, 5)
dead_code mov_reg_imm(r_temp0, 40=0x28)
dead_code add(r_temp0, r_local2)
dead_code store(r_temp0, r_local2, 2)
dead_code mov_reg_imm(r_temp0, 0=0x0)
dead_code mov_local_reg(local_3, r_temp0)
dead_code jump(label_0)
label(label_0)
call_ind(nlr_pop)
mov_reg_local(r_ret, local_3)
EXIT(0)
(tests/basics/array_micropython.py):
...
mov_local_reg(local_3, r_ret)
jump(label_0)
label(label_0)
mov_reg_local(r_arg1, local_6)
call_ind(native_swap_globals)
call_ind(nlr_pop)
mov_reg_local(r_ret, local_3)
EXIT(0)
Additional Information
This is not specific to mpy-cross as the same issue occurs when emitting a block of code at runtime as well.
Code of Conduct
Yes, I agree
py/emit: Improve the logic to detect and eliminate dead code
The existing dead-code finding logic - that used last_emit_was_return_value - was not very good.
This new logic tracks when an unconditional jump/raise occurs in the emitted code stream (bytecode or native machine code) and suppresses all subsequent code, until a label is assigned. This eliminates a lot of cases of dead code, with relatively simple logic.
This PR has the following code size change:
bare-arm: -16 -0.028%
minimal x86: -60 -0.036%
unix x64: -368 -0.070%
unix nanbox: -80 -0.017%
stm32: -204 -0.052% PYBV10
cc3200: +0 +0.000%
esp8266: -228 -0.033% GENERIC
esp32: -224 -0.015% GENERIC[incl -40(data)]
mimxrt: -192 -0.054% TEENSY40
renesas-ra: -200 -0.032% RA6M2_EK
nrf: +28 +0.015% pca10040
rp2: -256 -0.050% PICO
samd: -12 -0.009% ADAFRUIT_ITSYBITSY_M4_EXPRESS
Also generated bytecode is sometimes smaller (that's the whole point!). For example compiling all of uasyncio, this new optimisation reduces it by 13 bytes (from 8464 down to 8451 for sum of all uasyncio .mpy files).
One example of an optimisation is when there is a raise at the end of a function. In that case it no longer emits a redundant return None at the end of the function (saving 2 bytes).
This also uncovered a latent bug in the VM which is fixed here.