LVGL Performance Investigation — Complete Effort Analysis

Ticket: DEG-335 Branch: feature/lvgl-keyboard-benchmark Period: 2026-04-01 to 2026-04-10 (10 calendar days, 8 active) Result: 183ms → 5.9ms per frame (31x improvement)

Estimation Basis

The goal of this effort analysis is to estimate how much work this investigation would have represented in a traditional (pre-AI) engineering environment. Each thread is estimated in story points as if performed by a single senior embedded software specialist familiar with ARM Cortex-M, MPU/cache, linker scripts, LVGL, MicroPython, and debug probe tooling, working without AI assistance.

SP are calibrated on a scale where 1 SP = 1 working day, using 0.25 increments: 0.25 is a trivial config change or single-command verification (~2 hours), 0.5 is a small focused task such as a config change with a rebuild and test cycle (~half day), 1 is a full day’s work (implement a fix, write a test, investigate a hypothesis). Estimates account for cumulative context: later tasks build on earlier learnings, so a thread that would take a full day in isolation may only take half a day when the engineer already has the test suite, linker script, and debug infrastructure set up from previous threads.

In practice this work was performed by an engineer directing Claude Code agents, with roughly 5 hours of direct human attention across 10 calendar days and ~50 hours of autonomous agent compute. The SP estimates below represent what the same work would have cost a human engineer working alone, which is the relevant comparison for ROI assessment.

Grand Summary

Phase Period Threads SP Result
Investigation & fixes Apr 1-2 21 11 183ms → 57ms
SWO/trace infrastructure Apr 1-2 12 6 ITM working, DWT dead end
Profiling & OCRAM Apr 7-8 13 6.25 57ms → 9.3ms
PXP, architecture, production Apr 9-10 25 8.25 9.3ms → 5.9ms + runtime config
Total Apr 1-10 71 31.5 31x improvement

Phase 1: Investigation & Fixes (Apr 1-2)

# Thread Category Outcome SP
1 Brainstorm synthesis (3-agent) Analysis Completed — ranked hypothesis list, top hypothesis wrong 0.5
2 Benchmark test suite creation Test infra Completed — 18 test scripts, _common.py helper 1
3 Baseline capture & Phase 0-2 tests Analysis Completed — 183ms baseline, rotation hypothesis refuted 0.5
4 MPU discovery and configuration Driver fix Completed — MPU disabled, added TEX=1,C=1,B=1 for SDRAM 0.75
5 Indev callback overhead investigation Analysis Completed then corrected — methodology artefact 0.25
6 GC heap size reduction Optimisation Completed — linker LCD buffers, 64→16MB heap, gc 107ms→61us 1
7 Font glyph cache (LV_CACHE_DEF_SIZE) Optimisation Dead end — controls image cache not fonts 0.25
8 Keyboard styling optimisation Optimisation Completed — flat-fill removes borders, 36% improvement 0.75
9 PXP draw acceleration verification Analysis Completed — PXP confirmed active (at this point) 0.25
10 LV_DEF_REFR_PERIOD reduction (33→16ms) Optimisation Completed — halves worst-case touch latency 0.25
11 GPIO13 interrupt support in MicroPython Driver fix Completed — IRQ handler, GT928Touch interrupt mode 1
12 Touch interrupt deadlock (micropython.schedule) Driver fix Completed — redesigned to timer-resume approach 1
13 Vsync spurious wait bug Driver fix Completed — s_transferDone stale flag, 62→57ms 0.75
14 DIRECT vs FULL render mode investigation Analysis Completed — FULL mode worse (95ms), DIRECT optimal 0.25
15 Linker script changes (lcd_buffers, gc_heap) Driver fix Completed — NOLOAD section, gc_heap capped at 16MB 0.5
16 LCD_FB_COUNT=3 (missing PXP rotation buffer) Driver fix Completed — was 2, needed 3 0.25
17 Benchmark methodology correction Test infra Completed — tick_inc/sleep pattern caused accumulation artefact 0.25
18 Zephyr phyCORE-RT1170 config comparison Analysis Completed — no performance-relevant differences found 0.5
19 LV_USE_SYSMON perf monitor overlay Optimisation Completed — enabled, hidden behind keyboard widget 0.25
20 pyocd J-Link probe fix Tooling Completed — disable_dialog_boxes before open bug 0.5
21 probe-rs flash reliability Tooling Dead end — DAP FAULT on completion, abandoned 0.25
Phase 1 total 11

Phase 2: SWO/Trace Infrastructure (Apr 1-2)

# Thread Outcome SP
22 Orbuculum build and setup Completed — built, connected, never produced profiling data 0.25
23 SWO via J-Link: M7 TPIU path Dead end — M7 TPIU only outputs sync/timestamp 1
24 SWO via J-Link: GDB Server streaming Completed — discovered monitor SWO EnableTarget 0.25
25 SWO: dual TPIU collision diagnosis Partial — identified collision on GPIO_LPSR_11 0.25
26 SWO: system SWO via APB-AP#2 Completed — ITM printf proven working 1
27 SWO: OpenOCD config bugs Completed — wrong AP and wrong clock 0.25
28 RT1170 CoreSight topology mapping Completed — ROM table walk, ERR050708 errata 0.75
29 DWT PC sampling investigation Dead end — counters work but trace packets non-functional 1
30 IOMUXC_LPSR_GPR37 / debug auth Partial — reads 0x00, open lead 0.25
31 SWO firmware init (board_init.c) Completed — board_config_swo_trace in SystemInitHook 0.5
32 SWO-Lite / SSPSR=0 / FFCR analysis Completed — unimplemented registers, not failures 0.25
33 NXP AN14071 / AN13234 / ERR050708 research Completed — key references identified 0.25
Phase 2 total 6

Phase 3: Profiling & OCRAM (Apr 7-8)

# Thread Category Outcome SP
34 Touch interrupt v2 validation Analysis Completed — stable 30s+ typing, no deadlock 0.25
35 Render profile test (buttonmatrix) Analysis Completed — identified widget overhead bottleneck 0.75
36 Buttonmatrix clip area fix Optimisation Completed — 102ms→25.7ms (4x), upstream PR #9946 0.75
37 Flush profiling instrumentation Analysis Completed — flush not the bottleneck (<5%) 0.5
38 DWT CYCCNT instrumentation (refr cycle) Analysis Completed — obj_walk 95-97%, draw_wait <0.5% 1
39 Render breakdown test suite Test infra Completed — test_render_breakdown.py, test_phase_timing.py 0.5
40 OCRAM heap hypothesis test Analysis Completed — 25.7ms→8.3ms proof of concept (512KB) 0.5
41 OCRAM+OCRAM2 linker configuration Driver fix Completed — 1.25MB contiguous heap, 9.3ms final 0.5
42 Touch interrupt ISR init race fix Driver fix Completed — _read_timer=None guard 0.25
43 LVGL upstream PR review (Copilot) Code review Completed — margin, helper, bare blocks, row break 0.5
44 LV_DRAW_BUF_ALIGN submodule discovery Tooling Completed — colleague found ALIGN=4 vs 32 0.25
45 Submodule branch pointer updates Tooling Completed — pinned to fork with clip fix 0.25
46 Textarea invalidation bottleneck docs Documentation Completed — why interactive typing still slow 0.25
Phase 3 total 6.25

Phase 4: PXP, Architecture, Production (Apr 9-10)

# Thread Category Outcome SP
47 PXP draw acceleration enable Optimisation Completed — 9.3ms→5.9ms (169 FPS, 31x from baseline) 0.25
48 PXP rotation flag investigation Driver fix Completed — found micropython.mk override to 0 0.25
49 PXP rotation benchmark Analysis Completed — 39ms in FULL mode (26 FPS) 0.25
50 Display config spec (DISPLAY_CONFIG_SPEC.md) Architecture Completed — iterative Q&A, ownership boundaries 0.5
51 Display config Phase 1: display.mk + lcd_buffers.ld Architecture Completed — centralised LVGL flags, linker snippet 1
52 Display config Phase 2: runtime rotation guards Driver fix Completed — 16x #if DEMO_USE_ROTATE → runtime 0.5
53 Display config Phase 3: Python rotation API Architecture Completed — ILI9881CDisplay(rotation=270), touch auto-config 0.75
54 Review agent Phase 1 findings Code review Completed — legacy flags, include guard, #ifndef, default 0.25
55 Widget demo portrait layout Test infra Completed — keyboard at bottom, flex layout 0.25
56 Widget demo landscape layout Test infra Completed — rotation=270, touch mapped 0.25
57 aiorepl integration Test infra Completed — async REPL while demo runs 0.25
58 File-based rotation config (display.cfg) Architecture Completed — reads rotation at boot 0.25
59 SNVS register investigation Dead end Hard fault on access, needs unlock sequence 0.25
60 WAKE_UP pin investigation Dead end Not wired to expected GPIO, abandoned 0.25
61 Submodule state verification Tooling Completed — all remotes checked, micropython pushed 0.25
62 GitLab MR !49 preparation Documentation Completed — changes table with per-fix impact 0.25
63 Clean clone build verification Test infra Completed — fresh clone builds and runs 0.25
64 Colleague build failure diagnosis Tooling Completed — ALIGN=4 vs 32, 174ms result 0.25
65 Code review agents (4 parallel + validation) Code review Completed — architecture, quality, completeness, security 0.25
66 Screenshot test for partial-clip Test infra Completed — clipped container + scrolled buttonmatrix 0.25
67 mbm migration (v0.1.0 → v2.0.3 toml) Tooling Completed — config migrated, RTC.memory tracked 0.25
68 CI troubleshooting (GitHub failures) Tooling Completed — deferred complex fixes, verified settings 0.25
69 Git history cleanup Tooling Completed — consolidated across 4 repos 0.5
70 Code simplification review (/simplify) Code review Completed — automated complexity analysis 0.25
71 RTC.memory() tracking (micropython#18960) Architecture Completed — added to mbm.toml as tracked branch 0.25
Phase 4 total 8.25

Totals

SP
Phase 1: Investigation & fixes 11
Phase 2: SWO/trace infrastructure 6
Phase 3: Profiling & OCRAM 6.25
Phase 4: PXP, architecture, production 8.25
Grand total 31.5

By Category

Category SP % Threads
Driver fix 7.75 25% 14
Analysis / profiling 6 19% 14
Architecture 3.5 11% 7
Optimisation 4.25 13% 10
Test infrastructure 3.5 11% 9
Tooling 3.25 10% 9
Code review 1.25 4% 4
SWO/trace 6 19% 12
Documentation 0.5 2% 2
Dead end 3 10% 6

Note: dead ends overlap with other categories (an SWO dead end is also SWO).

By Outcome

Outcome SP % Threads
Completed (successful) 28 89% 61
Dead end 3 10% 6
Partial (open lead) 0.5 2% 2
Completed then corrected 0.25 <1% 1

Dead Ends Itemised

# Thread SP Value of dead end
7 Font glyph cache 0.25 Ruled out image cache as font bottleneck
21 probe-rs flash 0.25 Confirmed unreliable, avoided future use
23 M7 TPIU SWO path 1 Mapped incorrect path, led to system SWO discovery
29 DWT PC sampling 1 Confirmed silicon limitation, avoided further investment
59 SNVS registers 0.25 Identified need for RTC.memory() upstream feature
60 WAKE_UP pin 0.25 Board routing confirmed, saved future investigation

Total dead end investment: 3 SP (10% of total). Most produced useful negative results.


Performance Progression

Thread(s) Change Before After Improvement
4 MPU cacheable SDRAM 31% on fills
6, 15 GC heap 64→16MB + linker buffers 183ms 57ms 3.2x
8 Flat-fill keyboard styling 184ms 118ms 36%
13 Vsync spurious wait fix 62ms 57ms 8%
36 Buttonmatrix clip area fix 102ms 25.7ms 4x
41 GC heap → OCRAM 1.25MB 25.7ms 9.3ms 2.8x
47 PXP draw acceleration 9.3ms 5.9ms 1.6x
Cumulative 183ms 5.9ms 31x