I Stopped Hitting Claude Code Usage Limits (Here's How)

Brad | AI & Automation · 11m 00s · Watch on YouTube · 19 sources

Decision Card

Effort: Afternoon tune-up — run /context and /mcp in a fresh session, disable unused MCP servers, trim CLAUDE.md under ~200 lines, and add two or three settings.json entries; roughly 1–2 hours total, each item adoptable independently.

Honest take: The video’s central “message 30 costs 31x more” framing ignores prompt caching (Claude Code caches automatically; cache reads cost ~10% of normal input tokens), and its biggest recommendation — pruning MCP tool definitions — is partly obsolete because current Claude Code defers MCP tool schemas by default and only loads names until a tool is used; the “free” audit skill is also a funnel for the creator’s paid marketplace waitlist and strategy sessions.

Concrete next steps:

Run /context + /mcp and disable unused MCP servers — adopt (5 min; still worthwhile even with deferred tool loading, and it’s Anthropic’s own advice in the official cost-reduction docs)
Replace MCP servers with CLIs (gh, aws, playwright CLI, etc.) where available — adopt (30 min; officially endorsed as more context-efficient, though the “40%” figure is unverified)
Trim CLAUDE.md with progressive disclosure (core rules + one-line pointers to reference files) — adopt (30–60 min; matches Anthropic’s documented “under 200 lines” guidance)
Audit installed skills for verbosity — try (15 min; only metadata preloads per skill, so gains are modest unless you have dozens installed)
Set BASH_MAX_OUTPUT_LENGTH and an auto-compact override — try (10 min; the env vars are real, but the video misstates the unit — it’s 30,000 characters by default, not “30–50k tokens”, and output is middle-truncated rather than silently dropped)
Add Read() deny rules for node_modules/dist/lock files — try (10 min; good security hygiene, but weak as a token saver — Claude doesn’t preload those directories, and deny rules have documented silent-failure bugs and don’t block cat in Bash)
Adopt the habits: /clear between tasks, plan mode first, edit bad messages instead of correcting, model routing — adopt (free; all four are in Anthropic’s official guidance)
Download the creator’s Google Drive audit skill / join marketplace waitlist — skip (unreviewed skill distributed via Drive link is exactly the install-unvetted-code risk the video itself warns about; the official docs cover the same checklist)

TL;DR

The video argues Claude Code usage limits are usually a context-hygiene problem, not a limits problem: MCP tool schemas, bloated CLAUDE.md files, verbose skills, and misconfigured settings silently consume tens of thousands of tokens before you type a message, and every message re-pays that cost. Fixes include pruning MCP servers (or swapping them for CLIs), progressive disclosure in CLAUDE.md, settings tweaks (auto-compact threshold, bash output length, deny rules), and daily habits (/clear between tasks, plan mode, editing instead of correcting, right-sizing the model) — plus a promo for the creator’s free “context audit” skill and paid marketplace waitlist.

Key Points

Every message re-sends the whole conversation, so per-message cost compounds — the video claims message 30 costs “31 times more” than message 1 (true only without prompt caching). 00:42
CLAUDE.md files, MCP server schemas, and skill metadata are preloaded into context at session start; run /context in a fresh session to see the baseline — the creator’s was over 50,000 tokens. 01:31
Bloated context can worsen output because the model under-attends parts of a long window (“lost in the middle” effect — though the video garbles it as only missing the beginning). 01:05
MCP servers are the #1 offender: one server can carry ~18,000 tokens of tool definitions, and several can exceed 70,000 tokens of “dead weight” — fix by running /mcp and disabling unused servers. 02:16
Replacing MCPs with CLIs (e.g. Playwright, Apify) saves tokens because a CLI costs tokens only when invoked, not by existing; the creator claims ~40% savings. 02:53
Audit CLAUDE.md with five filters (default behavior? contradicts? repeats? band-aid? vague?) and cut anything that fails one. 03:48
Use progressive disclosure: keep only universal rules in CLAUDE.md and move API conventions, testing guidelines, etc. into reference files with one-line pointers Claude reads on demand. 04:25
Settings fixes: lower the auto-compact trigger to ~75% (default fires around 83%, after quality already degrades), raise BASH_MAX_OUTPUT_LENGTH from its ~30–50k default to avoid truncation-and-retry waste, and add deny rules so Claude can’t read node_modules/dist/lock files. 06:22
Daily habits: /clear between unrelated tasks (called the single biggest saver), plan mode before non-trivial work, edit bad exchanges instead of sending follow-up corrections, and route Sonnet/Haiku/Opus by task difficulty. 08:53
Closing thesis: usage limits are a context-hygiene problem and setups drift, which is why the audit ships as a rerunnable skill — free via a Google Drive link, alongside a paid marketplace waitlist. 10:27

Notable Quotes

“In fact, message 30 actually costs 31 times more than your first message when you’re in a Claude code session.” 00:42

“A CLI only costs tokens when Claude actually calls that command, and an MCP server costs tokens just by existing in your session.” 02:53

“It’s not a limits problem, it’s a context hygiene problem and your setup drifts over time.” 10:27

Verified Claims

Claim: Claude Code rereads the whole conversation each message, so message 30 costs ~31x message 1. 00:42 Sources: Anthropic — Prompt caching, Claude Code — Manage costs Verdict: Disputed — the resend mechanics are real, but Claude Code applies prompt caching automatically and cache reads cost ~10% of standard input tokens, so the raw 31x arithmetic substantially overstates real cost growth.
Claim: Every connected MCP server loads all its tool definitions into context on every message; one server can be ~18,000 tokens. 02:16 Sources: Claude Code — Reduce MCP server overhead, MCP spec issue #2808 on tool schema overhead Verdict: Disputed (outdated) — historically accurate (community measurements ranged from ~1k tokens per tool up to ~55k for the GitHub server), but current Claude Code defers MCP tool definitions by default via tool search, loading only tool names until a tool is used.
Claim: Replacing MCP servers with CLIs saves tokens (~40%) because CLIs cost nothing until invoked. 03:02 Sources: Claude Code — Manage costs (“Prefer CLI tools when available”), Anthropic — Code execution with MCP Verdict: Confirmed in direction, number unverified — Anthropic’s own docs recommend CLIs like gh and aws as more context-efficient than MCP servers; the specific 40% figure appears to be the creator’s anecdote.
Claim: CLAUDE.md is loaded into context at the start of every session and should be kept lean with progressive disclosure to reference files. 03:13 Sources: Claude Code — Manage costs (move instructions from CLAUDE.md to skills; keep under 200 lines), HumanLayer — Writing a good CLAUDE.md Verdict: Confirmed — official docs recommend keeping CLAUDE.md under ~200 lines and loading specialized instructions on demand.
Claim: Every installed skill’s metadata is preloaded into context so Claude can decide whether to trigger it. 05:19 Sources: Anthropic — Skill authoring best practices, Anthropic — Lessons from building Claude Code: how we use skills Verdict: Confirmed — skill name/description metadata preloads while the body loads on invocation (Anthropic recommends bodies under ~500 lines), though this also means preload cost per skill is small, softening the video’s alarm.
Claim: Auto-compact fires around 83% of the context window by default and can be lowered (e.g. to 75) via an override setting. 06:22 Sources: anthropics/claude-code issue #15719 (configurable compaction threshold), BSWEN — Claude Code auto-compact settings Verdict: Confirmed (community-documented) — the ~83% observed trigger and a percentage-override environment variable are reported across community sources and GitHub issues, though not prominent in official docs.
Claim: Bash output is truncated at a 30–50k default and “silently” dropped, causing wasteful reruns; fix by setting the max output env var to “150,000 tokens.” 06:43 Sources: Claude Code — Environment variables, anthropics/claude-code issue #19901 (30k character Bash limit) Verdict: Confirmed with corrections — BASH_MAX_OUTPUT_LENGTH exists and defaults to 30,000 characters (not tokens), and Claude Code middle-truncates (keeps beginning and end) rather than silently dropping everything past the limit.
Claim: Adding deny rules to settings.json permissions works “like a gitignore” and stops Claude reading node_modules, dist, and lock files. 07:11 Sources: Claude Code — Configure permissions, anthropics/claude-code issue #24846 (Read deny rules not enforced for .env) Verdict: Confirmed feature, disputed framing — Read() deny rules exist, but they gate the Read tool (not Bash subprocesses like cat), have documented silent-failure bugs, and Claude doesn’t preload denied directories anyway — so this is security hygiene more than a meaningful token saver.
Claim: As context bloats, the model attends mainly to the end of the window and misses the beginning, degrading output. 01:05 Sources: Liu et al., “Lost in the Middle” (arXiv:2307.03172 lineage; see follow-up arXiv:2311.09198), Atlan — Lost-in-the-middle problem Verdict: Confirmed with nuance — long-context degradation is well established, but the research shows a U-shape: models favor both beginning and end and lose the middle, not the beginning as the video states.
Claim: /clear between unrelated tasks is one of the biggest token savers. 08:53 Sources: Claude Code — Manage costs (“Clear between tasks”) Verdict: Confirmed — Anthropic’s docs list clearing between unrelated tasks as a top habit, and note unexpectedly high spend usually traces to never-cleared long sessions.

Tools, Papers & Standards Mentioned

Claude Code — https://code.claude.com/docs
CLAUDE.md / memory files — Claude Code memory docs referenced via cost guide
MCP (Model Context Protocol) — https://github.com/modelcontextprotocol/modelcontextprotocol
Claude skills — Skill authoring best practices
settings.json permissions / deny rules — Configure permissions
BASH_MAX_OUTPUT_LENGTH — Environment variables
Prompt caching — Anthropic prompt caching docs
Playwright MCP / Apify — mentioned as MCP-to-CLI swaps; no canonical link given in video
BMAD / PRD planning frameworks — mentioned by name only; no canonical source given in video
Claude models (Sonnet, Haiku, Opus) — Anthropic pricing/model docs
“Lost in the middle” research — arXiv long-context position-bias line of work

Follow-up Questions

With MCP tool-search/deferred loading now default in Claude Code, how much measurable per-session token overhead do typical MCP servers still add, and at what point does swapping to a CLI stop paying off?
What does the auto-compact override actually preserve versus discard at different thresholds — is triggering compaction at 75% measurably better for output quality than 83%, or does earlier summarization lose useful detail?
How much of subscription-plan limit consumption is driven by output tokens (including extended thinking) versus resent input context — i.e., would /effort or thinking-budget tuning save more than all the context pruning in this video combined?