<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Gemma on alelec blog</title>
    <link>https://notes.alelec.net/tags/gemma/</link>
    <description>Recent content in Gemma on alelec blog</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 13 Apr 2026 10:00:00 +1000</lastBuildDate>
    <atom:link href="https://notes.alelec.net/tags/gemma/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Running Local LLMs on an AMD APU Laptop with 56GB Unified Memory</title>
      <link>https://notes.alelec.net/posts/running-local-llms-on-amd-apu-laptop/</link>
      <pubDate>Mon, 13 Apr 2026 10:00:00 +1000</pubDate>
      <guid>https://notes.alelec.net/posts/running-local-llms-on-amd-apu-laptop/</guid>
      <description>&lt;p&gt;I recently got a Lenovo ThinkPad P14s Gen 6 with the AMD Ryzen AI 9 HX PRO 370 and 56GB of LPDDR5x RAM. I wanted to see what I could actually run on it for local LLM inference, and it turns out you can run pretty large models if you know how to get around a couple of gotchas with the AMD iGPU memory model.&lt;/p&gt;
&lt;p&gt;The short version: I&amp;rsquo;m running Qwen3.5-35B-A3B (a 35 billion parameter MoE model) and Gemma-4-26B-A4B (26B, also MoE) locally, served as an OpenAI-compatible API accessible from other machines on my network. No discrete GPU required. I&amp;rsquo;ve since put them to work on a real batch classification task (triaging ~4000 GitHub issues for the MicroPython project) and compared their output quality against Claude Sonnet.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
