Datacentre under the desk: your own personal AI

The Tesla V100 is a 2017 datacentre card that’s gone pretty cheap on the second-hand market, and it turns out it’s still a genuinely good bang-for-buck way to run LLMs locally in 2026. 16 GB of HBM2 at around 900 GB/s, and that memory bandwidth is the thing that actually matters for token generation. I’ve been assembling a few of these into machines and putting together a setup pack so anyone buying one can get going without fighting the toolchain, so this post is partly the story of getting there and partly a pointer to the kit. ...

June 24, 2026 · 12 min · Andrew Leech

Running Local LLMs on an AMD APU Laptop with 56GB Unified Memory

I recently got a Lenovo ThinkPad P14s Gen 6 with the AMD Ryzen AI 9 HX PRO 370 and 56GB of LPDDR5x RAM. I wanted to see what I could actually run on it for local LLM inference, and it turns out you can run pretty large models if you know how to get around a couple of gotchas with the AMD iGPU memory model. The short version: I’m running Qwen3.5-35B-A3B (a 35 billion parameter MoE model) and Gemma-4-26B-A4B (26B, also MoE) locally, served as an OpenAI-compatible API accessible from other machines on my network. No discrete GPU required. I’ve since put them to work on a real batch classification task (triaging ~4000 GitHub issues for the MicroPython project) and compared their output quality against Claude Sonnet. ...

April 13, 2026 · 12 min · Andrew Leech