How To De-Slop A Codebase Ruined By AI (with one skill)

Matt Pocock · 11m 19s · Watch on YouTube · 17 sources

Decision Card

Effort: One-evening trial — install the improve-codebase-architecture skill from Matt Pocock’s skills repo into .claude/skills/, run it once on an existing repo, and sit through one full “grilling session” on a single candidate (roughly 1–2 hours including reading the glossary).

Honest take: The title promises de-slopping “with one skill,” but Pocock explicitly says the opposite in the video — it is not an AFK skill, demands sustained judgment calls from the human, and he recommends re-running it every couple of days; the skill only finds and discusses refactoring candidates, while the actual fix still routes through you (or a separate agent via a GitHub issue). Also, the “seam” concept he presents is Michael Feathers’ term from Working Effectively with Legacy Code, which goes unattributed — the video’s glossary blends three sources (Ousterhout, Cockburn, Feathers) as if it were one framework.

Concrete next steps:

Read the skill’s actual SKILL.md and glossary at github.com/mattpocock/skills before running it — 15 min, and it tells you more precisely what the video paraphrases.
Run the skill on one real repo with auto-accept mode off, pick exactly one of the surfaced candidates, and drive it to a written module-shape proposal — 1–2 hours.
If the proposal looks good, convert it to an issue for async pickup rather than merging live — 15 min; see Sandcastle for the AFK-agent side of that pipeline.
Skip if you don’t have an AI-assisted codebase that has accumulated meaningful churn, or if you want a hands-off auto-refactoring tool — this workflow is explicitly human-in-the-loop and won’t run unattended.

TL;DR

Matt Pocock argues AI-assisted coding accelerates software entropy, and demonstrates his “improve-codebase-architecture” Claude skill, which scans a repo for shallow modules and proposes “deepening opportunities” using a shared glossary (module, interface, seam, adapter, depth, leverage, locality) drawn largely from Ousterhout’s A Philosophy of Software Design. The workflow is deliberately human-in-the-loop: the agent acts as a “tactical programmer” surfacing candidates and grilling you with design questions, while you make the strategic calls and optionally export the agreed design as an issue for an AFK agent.

Key Points

AI hasn’t made code cheap so much as it has accelerated software entropy: changes made without whole-codebase context compound into a “ball of mud.” 00:08
This video is the “cure” companion to his earlier prevention-focused deep-modules video, using his skills repo (~41.5k stars at recording). 00:36
The skill ships a glossary of architecture terms because a shared vocabulary with the AI makes requests far more precise. 01:16
Core primitive: a module is a unit (components, auth functions, a logger) with an interface — everything a caller must know, including docs — and an implementation. 01:42
Deep modules hide lots of implementation behind a simple interface; shallow modules invert that. The concepts come from John Ousterhout’s A Philosophy of Software Design. 02:45
Seams are where module interfaces meet the application — the natural place for unit/integration test boundaries and mocks. 03:47
Adapters (borrowed from hexagonal architecture) are concrete modules satisfying a seam’s interface — e.g. a real clock in production, a fake clock in tests. 04:14
Deep modules pay off twice: locality for maintainers (changes and bugs concentrate in one place) and leverage for callers (more capability per unit of interface learned). 04:48
Live demo on his ~1,500-commit course-video-manager repo (React Router + effect.ts): the skill found six deepening opportunities, including a concept with two parallel implementations and no single seam — a front-end/back-end desync risk. 05:34
The workflow ends in a “grilling session” of design questions, then a proposed module shape you can export as a GitHub issue for an AFK agent (see his Sandcastle video). 07:27
Explicit warning: this is not an AFK skill — agents are tactical programmers, you are the strategic one making judgment calls; run it every couple of days on fast-moving repos. 09:18
For legacy (i.e. bad) codebases, build a test harness around deep modules with clear seams before letting AI make changes — better tests directly improve agent output. 10:28

Notable Quotes

“AI has simply accelerated software entropy.” — 00:08

“A deep module hides lots of implementation behind a relatively simple interface.” — 02:45

“I think of agents as really, really good tactical programmers.” — 09:31

Verified Claims

Claim: Pocock’s GitHub skills repo was “currently sitting at 41.5k stars.” 01:08 Sources: mattpocock/skills on GitHub, Implicator: repo passes 45k stars Verdict: Confirmed — the repo exists and reporting tracks it crossing 45k+ (and later ~53k) stars, consistent with 41.5k at recording time.
Claim: The improve-codebase-architecture skill recently gained a glossary of architecture terminology (module, interface, depth, seam, adapter, leverage, locality). 01:12 Sources: SKILL.md in mattpocock/skills, Tessl registry entry Verdict: Confirmed — the published SKILL.md defines exactly this vocabulary, including “the interface is the test surface.”
Claim: Deep vs. shallow modules come from John Ousterhout’s A Philosophy of Software Design. 02:56 Sources: “Modules Should Be Deep!” (softengbook.org), Deep vs shallow modules — Sandor Dargo Verdict: Confirmed — the book’s central thesis matches the video’s definitions almost verbatim (note the author is Ousterhout, not “Osterhout”).
Claim: “Adapter” is a term taken from hexagonal architecture. 04:14 Sources: Alistair Cockburn’s hexagonal architecture page, Wikipedia: Hexagonal architecture Verdict: Confirmed — Cockburn’s ports-and-adapters pattern uses adapters exactly this way, with test doubles as alternate adapters for a port.
Claim (implicit): “Seam” as the place where you alter/test behavior is standard architecture vocabulary. 03:47 Sources: Michael Feathers, “Seams” (InformIT excerpt), Martin Fowler: Legacy Seam Verdict: Confirmed — but the term originates with Michael Feathers’ Working Effectively with Legacy Code, which the video never credits; Feathers defines it as a place to alter behavior without editing code there, slightly different from Pocock’s “where the interface lives.”
Claim: TanStack Query is an example of a deep module — lots of complexity behind a super simple interface. 03:21 Sources: TanStack Query docs, useQuery reference Verdict: Confirmed as a fair characterization — useQuery needs only a key and a promise-returning function while hiding caching, deduping, retries, and background refetching.
Claim: His demo repo is a React Router application using effect.ts under the hood. 05:43 Sources: React Router, Effect website Verdict: Confirmed both are real, actively maintained libraries (the repo itself is his private course-video-manager, so the pairing is taken at his word).
Claim: Sandcastle is his tool/video for AFK agents that pick up issues. 07:54 Sources: mattpocock/sandcastle on GitHub, YouTube: “I Open-Sourced My Own AFK Software Factory” Verdict: Confirmed — Sandcastle is his open-source TypeScript framework for orchestrating sandboxed coding agents in parallel.

Tools, Papers & Standards Mentioned

improve-codebase-architecture skill — SKILL.md source
Matt Pocock’s skills repo — github.com/mattpocock/skills (also browsable at skills.sh)
A Philosophy of Software Design, John Ousterhout — summarized at softengbook.org “Modules Should Be Deep!”
Hexagonal architecture / ports and adapters, Alistair Cockburn — alistair.cockburn.us/hexagonal-architecture
Seams (Working Effectively with Legacy Code, Michael Feathers) — InformIT chapter excerpt
TanStack Query — tanstack.com/query
React Router — reactrouter.com
Effect (effect.ts) — effect.website / github.com/Effect-TS/effect
Sandcastle — github.com/mattpocock/sandcastle
Claude Code (skills host) — demo runs inside a Claude session; skill format is Claude Code’s .claude/skills/ convention

Follow-up Questions

How does the skill’s “deepening opportunity” detection actually work under the hood — is it pure LLM exploration guided by the glossary, or does the SKILL.md encode concrete heuristics (e.g. the “one adapter = hypothetical seam, two = real” rule) that could be evaluated for precision/recall on a known-messy repo?
Pocock claims better tests directly improve agent output — is there measurable evidence (benchmarks, SWE-bench-style evals) that agent success rates rise with test-harness coverage and seam clarity in the target repo?
How does the grilling-session-to-issue-to-AFK-agent pipeline (this skill + Sandcastle) compare with deletion-first cleanup approaches (e.g. OMC’s ai-slop-cleaner) for reversing AI-generated slop — which produces fewer regressions per refactor?