I Stopped Hitting Claude Code Usage Limits (Here's How)
Decision Card
Effort: A focused afternoon — run /context in a fresh session, then work down the list: /mcp to disable unused servers, swap one or two MCPs for their CLIs, split your CLAUDE.md into reference files, and add ~4 lines to settings.json (auto-compact override, BASH_MAX_OUTPUT_LENGTH, deny rules). Each change is a few minutes; the CLAUDE.md restructure is the longest part.
Honest take: The habits are sound, but the video’s headline villain is partly outdated: current Claude Code already defers MCP tool definitions by default (MCP Tool Search), so the “18k tokens per server on every message” framing describes the old behavior, not a fresh install today. The genuinely free wins here are the settings.json tweaks and /clear discipline; the rest is good hygiene dressed up as a secret. Also note the “audit skill” is gated behind an email-waitlist / Google Drive funnel and a forthcoming paid marketplace — the advice is more reliable than the lead magnet.
Concrete next steps (per item — adopt / try / skip):
- adopt —
/clearbetween unrelated tasks. Single highest-leverage habit; costs nothing. Confirmed by Anthropic’s cost docs. - adopt — Add to
settings.json:CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75,BASH_MAX_OUTPUT_LENGTH=150000, andpermissions.denyfornode_modules,dist, lockfiles. ~10 min. See permissions docs. - adopt — Progressive disclosure for CLAUDE.md (keep it < ~200 lines, point to reference files). Endorsed verbatim by Anthropic.
- try — Replace heavy MCPs (Playwright, Apify) with their CLIs where one exists. Real savings, but the “40%” figure is the author’s anecdote, not a measured benchmark.
- try — Run
/mcpand disable servers you won’t use this session. Helpful, but less urgent now that tool defs are deferred by default. - try — Plan mode (Shift+Tab) before non-trivial work; use
/rewindinstead of stacking corrections. Both are documented cost-savers. - skip the funnel — Don’t chase the “context audit skill” via the Google Drive link or join the “verified marketplace” waitlist; the same checklist is in Anthropic’s free cost docs and you can do it by hand.
- Skip entirely if you run vanilla Claude Code with no MCP servers, a short CLAUDE.md, and few skills — your starting context is already small and most of this won’t move the needle.
TL;DR
A Claude-Code-specific guide to cutting “invisible” starting context — MCP servers, bloated CLAUDE.md files, verbose skills, and stale settings.json defaults — plus day-to-day habits like /clear, plan mode, and right-sizing the model. The advice is mostly aligned with Anthropic’s own cost guidance, though the video overstates MCP token cost for current versions and wraps the takeaways around a free-skill lead magnet and a paid marketplace waitlist.
Key Points
- Every message re-sends the whole conversation, so cost compounds turn over turn — the author claims message 30 costs ~31× the first 00:42.
- Bloated context isn’t just costlier; the model attends to the ends and “misses the middle,” so you pay more and get worse output 01:05.
- Run
/contextin a fresh session to see tokens you’re paying before sending anything — the author measured 50,000+ tokens of baseline overhead 01:48. - MCP servers were his #1 cost; he disables unused ones via
/mcpand replaces MCPs with CLIs, claiming ~40% token savings 02:42. - Audit CLAUDE.md for contradictions, cut rules that fail five filters, and apply progressive disclosure: keep universal rules, push the rest to reference files loaded on demand 03:20.
- Skill metadata loads into context for every installed skill; verbose 400–800-line skills burn context and past a point Claude starts ignoring rules 05:16.
settings.jsonwins: lower the auto-compact trigger to ~75% before quality degrades, raiseBASH_MAX_OUTPUT_LENGTHto 150,000 to avoid truncation-and-retry waste 06:20.- Add deny rules (git-ignore-style) so Claude can’t read
node_modules,dist, and lockfiles it doesn’t need 07:00. - Habits:
/clearbetween unrelated tasks, use plan mode before anything non-trivial, and don’t stack follow-up corrections — replace the bad exchange instead 09:10. - Right model for the job: Sonnet for most coding, Haiku for sub-agents and lookups, Opus for deep architecture 10:12.
Notable Quotes
“message 30 actually costs 31 times more than your first message when you’re in a Claude code session.” 00:42
“a CLI only costs tokens when Claude actually calls that command, and an MCP server costs tokens just by existing in your session.” 02:53
“it’s not a limits problem, it’s a context hygiene problem and your setup drifts over time.” 10:25
Verified Claims
Each message re-sends the full conversation, so token cost grows quadratically as the session lengthens. 00:42
- Expensively Quadratic: the LLM Agent Cost Curve — exe.dev, Context Management for AI Agents — Medium
- Verdict: Confirmed (the O(n²) cumulative cost is real without caching; the exact “31×” multiplier is illustrative, and prompt caching mitigates it in practice).
Bloated context degrades output because models attend to the start/end and lose the middle. 01:05
- Context Rot — Morph, Lost-in-the-middle discussion — Medium
- Verdict: Confirmed (the “lost in the middle” effect, Liu et al. 2023, is well documented).
Each connected MCP server injects ~18,000 tokens of tool definitions into every message, and stacking servers can exceed 70,000 tokens of dead weight. 02:04
- MCP Server Token Overhead — MindStudio, Connect Claude Code to tools via MCP — Anthropic docs, Reduce token usage — Anthropic docs
- Verdict: Disputed/outdated — the per-server overhead was real, but current Claude Code defers MCP tool definitions by default (only tool names load until a tool is used), so the “every message” framing no longer applies to a fresh install.
Replacing MCP servers with CLIs only costs tokens when the command runs, saving ~40%. 02:53
- Reduce MCP server overhead — Anthropic docs
- Verdict: Partially confirmed — Anthropic explicitly recommends preferring CLIs (
gh,aws,gcloud) as more context-efficient; the specific “40%” number is the author’s anecdote, unverified.
Auto-compact triggers around 83% by default and can be lowered (e.g. to 75%) via an override. 06:20
- CLAUDE_AUTOCOMPACT_PCT_OVERRIDE Guide — TurboAI, GitHub issue #28728 — anthropics/claude-code
- Verdict: Confirmed —
CLAUDE_AUTOCOMPACT_PCT_OVERRIDEexists, ~83% is the documented default ceiling, and the override can only fire earlier (values above ~83% are clamped).
The Bash tool truncates output around 30,000 characters by default; raise it with BASH_MAX_OUTPUT_LENGTH (max 150,000). 06:38
- GitHub issue #19901 — anthropics/claude-code, Tools reference — Anthropic docs
- Verdict: Confirmed (note: the value is in characters, not “tokens” as the video says; a v2.1.2+ bug reportedly affected this — see issue #17944).
Deny rules in settings.json stop Claude reading directories it doesn’t need (node_modules, dist, lockfiles). 07:00
- Configure permissions — Anthropic docs
- Verdict: Confirmed (with caveat: multiple open issues report deny rules being inconsistently enforced — e.g. #27040).
/clear between unrelated tasks, plan mode, and right-sizing the model (Sonnet/Haiku/Opus) reduce usage. 09:10
- Manage costs effectively — Anthropic docs
- Verdict: Confirmed — all three are recommended verbatim in Anthropic’s official cost-management guidance.
Tools, Papers & Standards Mentioned
- Claude Code — the agent being optimized.
- Model Context Protocol (MCP) in Claude Code — server integration discussed as the main context cost.
- MCP Tool Search — the deferred-tool-definition feature that supersedes much of the video’s MCP concern.
- Claude Code permissions / deny rules —
settings.jsonaccess control. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE— auto-compaction threshold env var.BASH_MAX_OUTPUT_LENGTH— Bash output limit env var.- CLAUDE.md / memory and Skills — progressive-disclosure targets.
- Claude models — Haiku / Sonnet / Opus — model-selection guidance.
- BMAD / PRD planning frameworks — named at 09:46; BMAD-METHOD on GitHub is the canonical source. (“PRD” refers generically to product-requirements-document-driven planning, not a specific tool.)
- The “context audit skill” and “verified skills marketplace” referenced throughout are the creator’s own gated/upcoming offerings with no public canonical URL.
Follow-up Questions
- With MCP tool definitions now deferred by default, how much real per-session overhead remains from MCP servers in current Claude Code — and does disabling them still measurably help, or is it premature optimization?
- Is the claimed “40% token savings” from swapping MCPs for CLIs reproducible under a controlled benchmark, or does it depend heavily on which servers (Playwright, Apify) and workflows are involved?
- Given open issues about deny rules being inconsistently enforced, what’s the most reliable current method to keep Claude from reading large vendored directories —
permissions.deny,.claudeignore-style mechanisms, oradditionalDirectoriesscoping?
Sources
- https://blog.exe.dev/expensively-quadratic
- https://medium.com/@najeebkan/context-management-for-ai-agents-d68716a37965
- https://www.morphllm.com/context-rot
- https://www.mindstudio.ai/blog/claude-code-mcp-server-token-overhead
- https://code.claude.com/docs/en/mcp
- https://code.claude.com/docs/en/costs
- https://www.turboai.dev/blog/claude-autocompact-pct-override-guide
- https://github.com/anthropics/claude-code/issues/28728
- https://github.com/anthropics/claude-code/issues/19901
- https://github.com/anthropics/claude-code/issues/17944
- https://code.claude.com/docs/en/tools-reference
- https://code.claude.com/docs/en/permissions
- https://github.com/anthropics/claude-code/issues/27040
- https://code.claude.com/docs/en/overview
- https://code.claude.com/docs/en/memory
- https://code.claude.com/docs/en/skills
- https://docs.claude.com/en/docs/about-claude/models
- https://github.com/bmadcode/BMAD-METHOD