Ticket: DEG-335
Branch: feature/lvgl-keyboard-benchmark
Period: 2026-04-01 to 2026-04-10 (10 calendar days, 8 active)
Result: 183ms → 5.9ms per frame (31x improvement)
The goal of this effort analysis is to estimate how much work this investigation would have represented in a traditional (pre-AI) engineering environment. Each thread is estimated in story points as if performed by a single senior embedded software specialist familiar with ARM Cortex-M, MPU/cache, linker scripts, LVGL, MicroPython, and debug probe tooling, working without AI assistance.
SP are calibrated on a scale where 1 SP = 1 working day, using 0.25 increments: 0.25 is a trivial config change or single-command verification (~2 hours), 0.5 is a small focused task such as a config change with a rebuild and test cycle (~half day), 1 is a full day’s work (implement a fix, write a test, investigate a hypothesis). Estimates account for cumulative context: later tasks build on earlier learnings, so a thread that would take a full day in isolation may only take half a day when the engineer already has the test suite, linker script, and debug infrastructure set up from previous threads.
In practice this work was performed by an engineer directing Claude Code agents, with roughly 5 hours of direct human attention across 10 calendar days and ~50 hours of autonomous agent compute. The SP estimates below represent what the same work would have cost a human engineer working alone, which is the relevant comparison for ROI assessment.
| Phase | Period | Threads | SP | Result |
|---|---|---|---|---|
| Investigation & fixes | Apr 1-2 | 21 | 11 | 183ms → 57ms |
| SWO/trace infrastructure | Apr 1-2 | 12 | 6 | ITM working, DWT dead end |
| Profiling & OCRAM | Apr 7-8 | 13 | 6.25 | 57ms → 9.3ms |
| PXP, architecture, production | Apr 9-10 | 25 | 8.25 | 9.3ms → 5.9ms + runtime config |
| Total | Apr 1-10 | 71 | 31.5 | 31x improvement |
| # | Thread | Category | Outcome | SP |
|---|---|---|---|---|
| 1 | Brainstorm synthesis (3-agent) | Analysis | Completed — ranked hypothesis list, top hypothesis wrong | 0.5 |
| 2 | Benchmark test suite creation | Test infra | Completed — 18 test scripts, _common.py helper | 1 |
| 3 | Baseline capture & Phase 0-2 tests | Analysis | Completed — 183ms baseline, rotation hypothesis refuted | 0.5 |
| 4 | MPU discovery and configuration | Driver fix | Completed — MPU disabled, added TEX=1,C=1,B=1 for SDRAM | 0.75 |
| 5 | Indev callback overhead investigation | Analysis | Completed then corrected — methodology artefact | 0.25 |
| 6 | GC heap size reduction | Optimisation | Completed — linker LCD buffers, 64→16MB heap, gc 107ms→61us | 1 |
| 7 | Font glyph cache (LV_CACHE_DEF_SIZE) | Optimisation | Dead end — controls image cache not fonts | 0.25 |
| 8 | Keyboard styling optimisation | Optimisation | Completed — flat-fill removes borders, 36% improvement | 0.75 |
| 9 | PXP draw acceleration verification | Analysis | Completed — PXP confirmed active (at this point) | 0.25 |
| 10 | LV_DEF_REFR_PERIOD reduction (33→16ms) | Optimisation | Completed — halves worst-case touch latency | 0.25 |
| 11 | GPIO13 interrupt support in MicroPython | Driver fix | Completed — IRQ handler, GT928Touch interrupt mode | 1 |
| 12 | Touch interrupt deadlock (micropython.schedule) | Driver fix | Completed — redesigned to timer-resume approach | 1 |
| 13 | Vsync spurious wait bug | Driver fix | Completed — s_transferDone stale flag, 62→57ms | 0.75 |
| 14 | DIRECT vs FULL render mode investigation | Analysis | Completed — FULL mode worse (95ms), DIRECT optimal | 0.25 |
| 15 | Linker script changes (lcd_buffers, gc_heap) | Driver fix | Completed — NOLOAD section, gc_heap capped at 16MB | 0.5 |
| 16 | LCD_FB_COUNT=3 (missing PXP rotation buffer) | Driver fix | Completed — was 2, needed 3 | 0.25 |
| 17 | Benchmark methodology correction | Test infra | Completed — tick_inc/sleep pattern caused accumulation artefact | 0.25 |
| 18 | Zephyr phyCORE-RT1170 config comparison | Analysis | Completed — no performance-relevant differences found | 0.5 |
| 19 | LV_USE_SYSMON perf monitor overlay | Optimisation | Completed — enabled, hidden behind keyboard widget | 0.25 |
| 20 | pyocd J-Link probe fix | Tooling | Completed — disable_dialog_boxes before open bug | 0.5 |
| 21 | probe-rs flash reliability | Tooling | Dead end — DAP FAULT on completion, abandoned | 0.25 |
| Phase 1 total | 11 |
| # | Thread | Outcome | SP |
|---|---|---|---|
| 22 | Orbuculum build and setup | Completed — built, connected, never produced profiling data | 0.25 |
| 23 | SWO via J-Link: M7 TPIU path | Dead end — M7 TPIU only outputs sync/timestamp | 1 |
| 24 | SWO via J-Link: GDB Server streaming | Completed — discovered monitor SWO EnableTarget |
0.25 |
| 25 | SWO: dual TPIU collision diagnosis | Partial — identified collision on GPIO_LPSR_11 | 0.25 |
| 26 | SWO: system SWO via APB-AP#2 | Completed — ITM printf proven working | 1 |
| 27 | SWO: OpenOCD config bugs | Completed — wrong AP and wrong clock | 0.25 |
| 28 | RT1170 CoreSight topology mapping | Completed — ROM table walk, ERR050708 errata | 0.75 |
| 29 | DWT PC sampling investigation | Dead end — counters work but trace packets non-functional | 1 |
| 30 | IOMUXC_LPSR_GPR37 / debug auth | Partial — reads 0x00, open lead | 0.25 |
| 31 | SWO firmware init (board_init.c) | Completed — board_config_swo_trace in SystemInitHook | 0.5 |
| 32 | SWO-Lite / SSPSR=0 / FFCR analysis | Completed — unimplemented registers, not failures | 0.25 |
| 33 | NXP AN14071 / AN13234 / ERR050708 research | Completed — key references identified | 0.25 |
| Phase 2 total | 6 |
| # | Thread | Category | Outcome | SP |
|---|---|---|---|---|
| 34 | Touch interrupt v2 validation | Analysis | Completed — stable 30s+ typing, no deadlock | 0.25 |
| 35 | Render profile test (buttonmatrix) | Analysis | Completed — identified widget overhead bottleneck | 0.75 |
| 36 | Buttonmatrix clip area fix | Optimisation | Completed — 102ms→25.7ms (4x), upstream PR #9946 | 0.75 |
| 37 | Flush profiling instrumentation | Analysis | Completed — flush not the bottleneck (<5%) | 0.5 |
| 38 | DWT CYCCNT instrumentation (refr cycle) | Analysis | Completed — obj_walk 95-97%, draw_wait <0.5% | 1 |
| 39 | Render breakdown test suite | Test infra | Completed — test_render_breakdown.py, test_phase_timing.py | 0.5 |
| 40 | OCRAM heap hypothesis test | Analysis | Completed — 25.7ms→8.3ms proof of concept (512KB) | 0.5 |
| 41 | OCRAM+OCRAM2 linker configuration | Driver fix | Completed — 1.25MB contiguous heap, 9.3ms final | 0.5 |
| 42 | Touch interrupt ISR init race fix | Driver fix | Completed — _read_timer=None guard | 0.25 |
| 43 | LVGL upstream PR review (Copilot) | Code review | Completed — margin, helper, bare blocks, row break | 0.5 |
| 44 | LV_DRAW_BUF_ALIGN submodule discovery | Tooling | Completed — colleague found ALIGN=4 vs 32 | 0.25 |
| 45 | Submodule branch pointer updates | Tooling | Completed — pinned to fork with clip fix | 0.25 |
| 46 | Textarea invalidation bottleneck docs | Documentation | Completed — why interactive typing still slow | 0.25 |
| Phase 3 total | 6.25 |
| # | Thread | Category | Outcome | SP |
|---|---|---|---|---|
| 47 | PXP draw acceleration enable | Optimisation | Completed — 9.3ms→5.9ms (169 FPS, 31x from baseline) | 0.25 |
| 48 | PXP rotation flag investigation | Driver fix | Completed — found micropython.mk override to 0 | 0.25 |
| 49 | PXP rotation benchmark | Analysis | Completed — 39ms in FULL mode (26 FPS) | 0.25 |
| 50 | Display config spec (DISPLAY_CONFIG_SPEC.md) | Architecture | Completed — iterative Q&A, ownership boundaries | 0.5 |
| 51 | Display config Phase 1: display.mk + lcd_buffers.ld | Architecture | Completed — centralised LVGL flags, linker snippet | 1 |
| 52 | Display config Phase 2: runtime rotation guards | Driver fix | Completed — 16x #if DEMO_USE_ROTATE → runtime | 0.5 |
| 53 | Display config Phase 3: Python rotation API | Architecture | Completed — ILI9881CDisplay(rotation=270), touch auto-config | 0.75 |
| 54 | Review agent Phase 1 findings | Code review | Completed — legacy flags, include guard, #ifndef, default | 0.25 |
| 55 | Widget demo portrait layout | Test infra | Completed — keyboard at bottom, flex layout | 0.25 |
| 56 | Widget demo landscape layout | Test infra | Completed — rotation=270, touch mapped | 0.25 |
| 57 | aiorepl integration | Test infra | Completed — async REPL while demo runs | 0.25 |
| 58 | File-based rotation config (display.cfg) | Architecture | Completed — reads rotation at boot | 0.25 |
| 59 | SNVS register investigation | Dead end | Hard fault on access, needs unlock sequence | 0.25 |
| 60 | WAKE_UP pin investigation | Dead end | Not wired to expected GPIO, abandoned | 0.25 |
| 61 | Submodule state verification | Tooling | Completed — all remotes checked, micropython pushed | 0.25 |
| 62 | GitLab MR !49 preparation | Documentation | Completed — changes table with per-fix impact | 0.25 |
| 63 | Clean clone build verification | Test infra | Completed — fresh clone builds and runs | 0.25 |
| 64 | Colleague build failure diagnosis | Tooling | Completed — ALIGN=4 vs 32, 174ms result | 0.25 |
| 65 | Code review agents (4 parallel + validation) | Code review | Completed — architecture, quality, completeness, security | 0.25 |
| 66 | Screenshot test for partial-clip | Test infra | Completed — clipped container + scrolled buttonmatrix | 0.25 |
| 67 | mbm migration (v0.1.0 → v2.0.3 toml) | Tooling | Completed — config migrated, RTC.memory tracked | 0.25 |
| 68 | CI troubleshooting (GitHub failures) | Tooling | Completed — deferred complex fixes, verified settings | 0.25 |
| 69 | Git history cleanup | Tooling | Completed — consolidated across 4 repos | 0.5 |
| 70 | Code simplification review (/simplify) | Code review | Completed — automated complexity analysis | 0.25 |
| 71 | RTC.memory() tracking (micropython#18960) | Architecture | Completed — added to mbm.toml as tracked branch | 0.25 |
| Phase 4 total | 8.25 |
| SP | |
|---|---|
| Phase 1: Investigation & fixes | 11 |
| Phase 2: SWO/trace infrastructure | 6 |
| Phase 3: Profiling & OCRAM | 6.25 |
| Phase 4: PXP, architecture, production | 8.25 |
| Grand total | 31.5 |
| Category | SP | % | Threads |
|---|---|---|---|
| Driver fix | 7.75 | 25% | 14 |
| Analysis / profiling | 6 | 19% | 14 |
| Architecture | 3.5 | 11% | 7 |
| Optimisation | 4.25 | 13% | 10 |
| Test infrastructure | 3.5 | 11% | 9 |
| Tooling | 3.25 | 10% | 9 |
| Code review | 1.25 | 4% | 4 |
| SWO/trace | 6 | 19% | 12 |
| Documentation | 0.5 | 2% | 2 |
| Dead end | 3 | 10% | 6 |
Note: dead ends overlap with other categories (an SWO dead end is also SWO).
| Outcome | SP | % | Threads |
|---|---|---|---|
| Completed (successful) | 28 | 89% | 61 |
| Dead end | 3 | 10% | 6 |
| Partial (open lead) | 0.5 | 2% | 2 |
| Completed then corrected | 0.25 | <1% | 1 |
| # | Thread | SP | Value of dead end |
|---|---|---|---|
| 7 | Font glyph cache | 0.25 | Ruled out image cache as font bottleneck |
| 21 | probe-rs flash | 0.25 | Confirmed unreliable, avoided future use |
| 23 | M7 TPIU SWO path | 1 | Mapped incorrect path, led to system SWO discovery |
| 29 | DWT PC sampling | 1 | Confirmed silicon limitation, avoided further investment |
| 59 | SNVS registers | 0.25 | Identified need for RTC.memory() upstream feature |
| 60 | WAKE_UP pin | 0.25 | Board routing confirmed, saved future investigation |
Total dead end investment: 3 SP (10% of total). Most produced useful negative results.
| Thread(s) | Change | Before | After | Improvement |
|---|---|---|---|---|
| 4 | MPU cacheable SDRAM | — | — | 31% on fills |
| 6, 15 | GC heap 64→16MB + linker buffers | 183ms | 57ms | 3.2x |
| 8 | Flat-fill keyboard styling | 184ms | 118ms | 36% |
| 13 | Vsync spurious wait fix | 62ms | 57ms | 8% |
| 36 | Buttonmatrix clip area fix | 102ms | 25.7ms | 4x |
| 41 | GC heap → OCRAM 1.25MB | 25.7ms | 9.3ms | 2.8x |
| 47 | PXP draw acceleration | 9.3ms | 5.9ms | 1.6x |
| Cumulative | 183ms | 5.9ms | 31x |