I Stopped Hitting Claude Code Usage Limits (Here's How)

Brad | AI & Automation · 11m 00s · Watch on YouTube · 19 sources

Decision Card

Effort: A focused afternoon — run /context in a fresh session, then work down the list: /mcp to disable unused servers, swap one or two MCPs for their CLIs, split your CLAUDE.md into reference files, and add ~4 lines to settings.json (auto-compact override, BASH_MAX_OUTPUT_LENGTH, deny rules). Each change is a few minutes; the CLAUDE.md restructure is the longest part.

Honest take: The habits are sound, but the video’s headline villain is partly outdated: current Claude Code already defers MCP tool definitions by default (MCP Tool Search), so the “18k tokens per server on every message” framing describes the old behavior, not a fresh install today. The genuinely free wins here are the settings.json tweaks and /clear discipline; the rest is good hygiene dressed up as a secret. Also note the “audit skill” is gated behind an email-waitlist / Google Drive funnel and a forthcoming paid marketplace — the advice is more reliable than the lead magnet.

Concrete next steps (per item — adopt / try / skip):

  • adopt/clear between unrelated tasks. Single highest-leverage habit; costs nothing. Confirmed by Anthropic’s cost docs.
  • adopt — Add to settings.json: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=75, BASH_MAX_OUTPUT_LENGTH=150000, and permissions.deny for node_modules, dist, lockfiles. ~10 min. See permissions docs.
  • adopt — Progressive disclosure for CLAUDE.md (keep it < ~200 lines, point to reference files). Endorsed verbatim by Anthropic.
  • try — Replace heavy MCPs (Playwright, Apify) with their CLIs where one exists. Real savings, but the “40%” figure is the author’s anecdote, not a measured benchmark.
  • try — Run /mcp and disable servers you won’t use this session. Helpful, but less urgent now that tool defs are deferred by default.
  • try — Plan mode (Shift+Tab) before non-trivial work; use /rewind instead of stacking corrections. Both are documented cost-savers.
  • skip the funnel — Don’t chase the “context audit skill” via the Google Drive link or join the “verified marketplace” waitlist; the same checklist is in Anthropic’s free cost docs and you can do it by hand.
  • Skip entirely if you run vanilla Claude Code with no MCP servers, a short CLAUDE.md, and few skills — your starting context is already small and most of this won’t move the needle.

TL;DR

A Claude-Code-specific guide to cutting “invisible” starting context — MCP servers, bloated CLAUDE.md files, verbose skills, and stale settings.json defaults — plus day-to-day habits like /clear, plan mode, and right-sizing the model. The advice is mostly aligned with Anthropic’s own cost guidance, though the video overstates MCP token cost for current versions and wraps the takeaways around a free-skill lead magnet and a paid marketplace waitlist.

Key Points

  • Every message re-sends the whole conversation, so cost compounds turn over turn — the author claims message 30 costs ~31× the first 00:42.
  • Bloated context isn’t just costlier; the model attends to the ends and “misses the middle,” so you pay more and get worse output 01:05.
  • Run /context in a fresh session to see tokens you’re paying before sending anything — the author measured 50,000+ tokens of baseline overhead 01:48.
  • MCP servers were his #1 cost; he disables unused ones via /mcp and replaces MCPs with CLIs, claiming ~40% token savings 02:42.
  • Audit CLAUDE.md for contradictions, cut rules that fail five filters, and apply progressive disclosure: keep universal rules, push the rest to reference files loaded on demand 03:20.
  • Skill metadata loads into context for every installed skill; verbose 400–800-line skills burn context and past a point Claude starts ignoring rules 05:16.
  • settings.json wins: lower the auto-compact trigger to ~75% before quality degrades, raise BASH_MAX_OUTPUT_LENGTH to 150,000 to avoid truncation-and-retry waste 06:20.
  • Add deny rules (git-ignore-style) so Claude can’t read node_modules, dist, and lockfiles it doesn’t need 07:00.
  • Habits: /clear between unrelated tasks, use plan mode before anything non-trivial, and don’t stack follow-up corrections — replace the bad exchange instead 09:10.
  • Right model for the job: Sonnet for most coding, Haiku for sub-agents and lookups, Opus for deep architecture 10:12.

Notable Quotes

“message 30 actually costs 31 times more than your first message when you’re in a Claude code session.” 00:42

“a CLI only costs tokens when Claude actually calls that command, and an MCP server costs tokens just by existing in your session.” 02:53

“it’s not a limits problem, it’s a context hygiene problem and your setup drifts over time.” 10:25

Verified Claims

Each message re-sends the full conversation, so token cost grows quadratically as the session lengthens. 00:42

Bloated context degrades output because models attend to the start/end and lose the middle. 01:05

Each connected MCP server injects ~18,000 tokens of tool definitions into every message, and stacking servers can exceed 70,000 tokens of dead weight. 02:04

Replacing MCP servers with CLIs only costs tokens when the command runs, saving ~40%. 02:53

  • Reduce MCP server overhead — Anthropic docs
  • Verdict: Partially confirmed — Anthropic explicitly recommends preferring CLIs (gh, aws, gcloud) as more context-efficient; the specific “40%” number is the author’s anecdote, unverified.

Auto-compact triggers around 83% by default and can be lowered (e.g. to 75%) via an override. 06:20

The Bash tool truncates output around 30,000 characters by default; raise it with BASH_MAX_OUTPUT_LENGTH (max 150,000). 06:38

Deny rules in settings.json stop Claude reading directories it doesn’t need (node_modules, dist, lockfiles). 07:00

/clear between unrelated tasks, plan mode, and right-sizing the model (Sonnet/Haiku/Opus) reduce usage. 09:10

Tools, Papers & Standards Mentioned

Follow-up Questions

  1. With MCP tool definitions now deferred by default, how much real per-session overhead remains from MCP servers in current Claude Code — and does disabling them still measurably help, or is it premature optimization?
  2. Is the claimed “40% token savings” from swapping MCPs for CLIs reproducible under a controlled benchmark, or does it depend heavily on which servers (Playwright, Apify) and workflows are involved?
  3. Given open issues about deny rules being inconsistently enforced, what’s the most reliable current method to keep Claude from reading large vendored directories — permissions.deny, .claudeignore-style mechanisms, or additionalDirectories scoping?

Sources