I’ve been using Claude Code daily for months now. It writes most of my code, documentation, commit messages, PR descriptions. The code is great. The prose, not so much.

Not because it’s bad writing, it’s perfectly competent. The problem is it sounds like AI. Every PR description comes out with “This PR implements…” and “Additionally…” and perfect parallel structure across all bullet points. It’s polished in a way that I’m not, and that’s exactly what makes it obvious.

I wanted to fix this. Not to deceive anyone, just so the text Claude generates actually sounds like something I’d write, saving me from rewriting everything by hand.

The approach

Claude Code has an output styles feature where you can provide a markdown file that guides how it writes prose. Most examples I’ve seen are pretty generic though, stuff like “be concise” and “use technical language”. These don’t capture what makes a specific person’s writing theirs.

So I took a different approach. I collected about 140,000 words of my own writing from GitHub and Reddit, ran quantitative linguistic analysis on the corpus, then did qualitative pattern extraction with Claude itself. The result is a style guide grounded in actual data from my writing, not vibes.

I’m describing this as a personal writing style project, but the same process works for any style target. If you need Claude to match your company’s formal documentation tone, or write in the style of a specific regulatory framework, you’d just start with the appropriate source material instead of personal writing samples. Feed it your existing enterprise docs, your team’s technical standards, whatever you want it to emulate, and the analysis pipeline works the same way.

Collecting samples

I pulled from two main sources:

GitHub - 58 PRs and 11 issues from 2024, mostly MicroPython ecosystem contributions. Collected using the gh CLI, one text file per PR/issue with description and all comments. This gave me my “technical register” - how I write when I’m being precise about code.

Reddit - Full account export using rexport (a PRAW wrapper). This dumped about 25MB of JSON covering years of comments and posts. The reddit data is where the casual voice lives - opinions, troubleshooting help, project announcements, random conversations about coffee and cars.

The two sources together are important. Technical writing on its own would give a one-dimensional picture. The reddit data captures how I actually talk when I’m not being careful about it, which is where most of the distinctive patterns come from.

What the analysis found

I ran a Python script (standard library only, no NLP packages needed) that produces quantitative metrics across the full corpus. A couple of the interesting numbers:

Corpus size:              140,772 words, 7,178 sentences, 1,828 paragraphs
Avg sentence length:      19.6 words (median: 18.0)
Contraction rate:         3.36% of words
Passive voice:            8.7% of sentences
Flesch Reading Ease:      64.2 (Standard, ~9th grade)

The punctuation analysis was particularly revealing:

Punctuation                 Count    Per 1000 words
slashes (/)                  6299             44.7
commas (,)                   5412             38.4
semicolons (;)                132              0.9
em-dashes (--)                 15              0.1

I use slashes at 44x the rate I use semicolons. I basically never use emdashes. These aren’t things I’d have thought to put in a style guide manually, but they’re a big part of why AI text doesn’t read like mine - Claude loves emdashes and semicolons, I don’t use either.

The sentence starters were interesting too. “I” starts 9% of my sentences. Contractions as sentence openers are common (“I’ve” 3.5%, “I’m” 2.5%). Informal starters like “Yeah” (1.8%) and “Ah” (1.3%) show up regularly. Claude would never start a sentence with “Yeah” or “Ah” unless told to.

Beyond the numbers I ran three qualitative analysis passes with Claude agents reading through the actual text:

  • Source-specific patterns - How I structure PR descriptions vs Reddit comments vs bug reports. Turns out I have a consistent PR format (what it does, why, trade-offs, testing status) and a consistent Reddit comment format (direct answer first, personal experience backing, caveat, practical ending).
  • Contrastive analysis - The most valuable one. Take passages from my writing, write how AI would phrase the same content, and annotate what’s different. This produces concrete before/after pairs that work much better than abstract rules.
  • Register mapping - How my formality level shifts between contexts. I write differently in a GitHub PR review (“Ah yes, thanks for the reminder, I’ve rebased it now”) vs a Reddit opinion post vs a bug report with call stack traces.

What the style guide looks like

The final output is a structured markdown file with sections for each aspect of the voice. Here’s a few excerpts to give a sense of it.

The vocabulary mapping is a simple table of “use this, not the AI equivalent”:

Use this Not this
pretty (as modifier) fairly, quite, rather
basically essentially
a couple of several, a few
tricky challenging, complex
heaps a lot, extensively
just simply

The contrastive examples are probably the most effective part. Here’s one:

AI: “I am quite confident that both embedded fields, as well as related areas, will experience significant disruption from AI, similar to the rest of the software engineering industry.”

Author: “I’m pretty certain both / all embedded fields will be highly disrupted by AI, just like the rest of software engineering.”

Key differences: “Pretty certain” not “quite confident”. Slash in “both / all” as informal hedge. “Just like” not “similar to”. Shorter, punchier. Contraction.

And another:

AI: “Hello! I apologize for the delayed response. It looks like you’ve made quite good progress on your own!”

Author: “Hi, sorry I wasn’t able to help sooner, you’ve done pretty well though it seems!”

Key differences: “Hi,” not “Hello!”. Comma splice. “Pretty well though it seems” - double hedge with trailing “though”. “Sorry I wasn’t able to help sooner” is specific, not generic.

There’s also a “What This Voice Never Does” section that lists patterns completely absent from 140K words of writing. Some of the entries:

  • Never uses modern emoji. Only old-school text emoticons :-) :-D
  • Never opens with a compliment to the person being addressed. No “Great question!”
  • Never restates the question before answering it
  • Never summarizes at the end of a section or response
  • Never uses “I think” as false modesty before strong opinions, only for genuine uncertainty

These “never” rules are surprisingly effective. A lot of what makes AI text read as AI is the presence of patterns real people don’t use, not just the absence of patterns they do.

How to build your own

I’ve packaged the whole process into a Claude Code skill that walks you through it. It covers:

  1. Collect - Gathering writing samples from GitHub (gh CLI), Reddit (rexport), Stack Overflow, Slack, blogs, commit messages, or anywhere else you write
  2. Extract - Transforming raw data into a clean text corpus (scripts included for GitHub and Reddit)
  3. Analyse - Running quantitative metrics and qualitative pattern extraction (analysis script included, Python standard library only)
  4. Synthesise - Filling in a structured style guide template, creating a Claude Code output style, and optionally packaging as an installable skill

The skill includes:

  • scripts/analyze_style.py - Quantitative corpus analysis (sentence length, vocabulary, punctuation, readability, contractions, passive voice, n-grams)
  • scripts/collect_github.sh - Automated GitHub PR/issue collection
  • scripts/extract_reddit_text.py - Reddit JSON to plain text extraction
  • references/style-guide-template.md - Blank template with all sections to fill in

You need at least 20,000 words for a basic profile. 100,000+ words is where the statistical patterns become really reliable. Two different contexts (e.g., GitHub + Reddit, or work Slack + blog) gives a much better result than one source alone because it captures how your voice shifts by register.

Download: create-writing-style-skill.zip

To install, unzip into ~/.claude/skills/create-writing-style/ and it’ll be available in your Claude Code sessions. Ask Claude to help you “create a writing style” and the skill takes over from there.

Oh, and to be clear, I don’t have a degree in linguistics. I worked with Claude every step of the way through this - from designing the analysis approach, to running the quantitative and qualitative passes, to converting the results into a style guide, then packaging it as a reusable skill for others. Including writing this post. The irony of using AI to teach AI to sound less like AI is not lost on me, but it works pretty well.