Ollama + Claude Code = 99% CHEAPER
Decision Card
Effort: Evening project (~30–60 min) — install Ollama, run ollama pull qwen3.5 then ollama launch claude; OR make an OpenRouter account, drop $10 in for the higher rate limit, and paste four env vars into .claude/settings.local.json.
Honest take: The title says “free” but the video itself walks it back twice — you pay $5 in Anthropic credits you “never consume” (a workaround, not free) for the Ollama-launch path, and good local models need hardware you may not have, so the genuinely usable route ends up being OpenRouter’s paid-tier ($10) or cheap-model setup, which is “cheaper,” not free.
Concrete next steps:
- Read the official Ollama → Claude Code guide and run
ollama launch claudeagainst a local model (~20 min, assumes 16GB+ RAM). - If local is too slow, follow the OpenRouter ↔ Claude Code env-var setup and set
ANTHROPIC_DEFAULT_HAIKU_MODELtoo, not justANTHROPIC_MODEL, so background calls don’t bill Anthropic (~15 min). - Skip if you already have a Claude Pro/Max subscription and your main use is high-stakes coding — local 9B models are markedly slower and weaker, and the video admits Opus is still the safe choice for “stuff you can’t mess up.”
TL;DR
Two ways to swap Claude Code’s “engine” for free/cheap open-weight models: run one locally via Ollama, or route through OpenRouter’s free/cheap models by overriding Claude Code’s API environment variables. Both work and are allowed by Anthropic, but “free” is qualified — local needs hardware, the cloud paths have rate limits or per-token costs, and quality/speed trail the native Claude models.
Key Points
- Claude Code is a “harness” (the car) wrapped around a model (the engine); you can swap the engine for an open-weight model while keeping Anthropic’s agent harness. 00:33
- You can’t run Opus locally because it’s closed-source — frontier models like Sonnet/GPT/Gemini are only reachable through paid APIs. 01:37
- The open vs. closed performance gap (measured on SWE-bench Verified) is shrinking, and some open-weight models now beat older Claude Sonnet 3.7. 02:46
- Method 1: install Ollama,
ollama pulla model (demo uses a ~6.6GB 9B Qwen 3.5), thenollama launch claudeto pick that model inside Claude Code. 05:09 - The local path still requires buying $5 of Anthropic API credits to satisfy onboarding, though you supposedly never spend them once routed to a local model. 09:56
- Ollama may default a model’s context to less than the displayed 200k; you create a custom model with a larger context (e.g. 64k) to fix broken state/tool visibility. 12:11
- Ollama Cloud (e.g. MiniMax M2.7) gives bigger, faster models with no local hardware, but reintroduces a pricing/usage barrier and is no longer 100% private. 13:39
- Best fit for open models: low-stakes/high-volume work — summarizing, grepping, scaffolding, triage, simple tests — and as a fallback when Claude is down or you’ve hit session limits. 16:18
- Method 2: set env vars in
.claude/settings.local.jsonpointing the base URL to OpenRouter, putting your OpenRouter key in the Anthropic auth token field, and leaving the Anthropic API key blank. 17:25 - Critical gotcha: you must override the Haiku/Sonnet/Opus model variables, not just the default — otherwise tool calls silently fall back to paid Anthropic Haiku and charge you. 19:54
- OpenRouter free tier is 50 requests/day; loading $10 of credits raises it to 1,000/day without consuming the credits. 18:03
Notable Quotes
“When we use Claude code natively, we are using basically a harness that is wrapped around Opus or Sonnet or one of Claude’s models… So, Claude code tells the model how to organize its folders and how to use tools.” 00:43
“There’s like there’s really no such thing as free because if you want to run a really good model locally, then you need the hardware to support it.” 14:32
“If you would have just put in this configuration, and you would have left out these variables down here, then it would have, by default, used Sonnet or Haiku for basically all of these things, and it would have charged you without you even knowing.” 20:24
Verified Claims
Claim: Claude Code can be pointed at a different backend by overriding ANTHROPIC_BASE_URL / ANTHROPIC_AUTH_TOKEN and leaving ANTHROPIC_API_KEY blank. 17:25
- Use custom LLM providers in Claude Code (Xin Fu), Claude Code model configuration docs
- Verdict: Confirmed (the official docs note
ANTHROPIC_BASE_URLchanges where requests go, not which model answers — exactly the swap described).
Claim: You must set the Haiku/Sonnet/Opus default-model variables, not just the main model, or background/tool calls fall back to paid Anthropic models. 19:54
- Claude Code model configuration docs
- Verdict: Confirmed (
ANTHROPIC_DEFAULT_HAIKU_MODELexplicitly controls “background functionality” per the docs).
Claim: Ollama integrates directly with Claude Code via an ollama launch claude command, no proxy needed. 09:01
- Ollama → Claude Code docs, ollama launch blog
- Verdict: Confirmed (Ollama speaks the Anthropic Messages API since v0.14;
ollama launch claudeis a documented command).
Claim: Claude Code needs a large context window; small/default Ollama context breaks it, ~64k recommended. 12:15
- Ollama → Claude Code docs
- Verdict: Confirmed (Ollama docs recommend “at least 64k tokens” and adjusting
num_ctx).
Claim: OpenRouter free models are limited to 50 requests/day, rising to 1,000/day after purchasing ≥$10 in credits. 18:03
- OpenRouter FAQ, OpenRouter rate limits
- Verdict: Confirmed (under 10 credits = 50/day; ≥10 credits = 1,000/day;
:freevariants also capped at ~20 req/min).
Claim: Using non-Anthropic models in Claude Code is allowed / not against Anthropic’s ToS. 04:50
- MindStudio: run Claude Code free with Ollama & OpenRouter
- Verdict: Inconclusive (the env-var mechanism is officially supported and widely used, but third-party model use is functionally allowed rather than explicitly endorsed — “not promoted” is the more accurate framing than “definitely fine”).
Claim: The open-source vs. closed-source gap on SWE-bench Verified is shrinking, with some open models beating Sonnet 3.7. 02:46
- MindStudio: Qwen 3.6 / Kimi K2.6 closing the frontier gap, LLM Benchmarks 2026
- Verdict: Confirmed (by 2026 open models like MiniMax M2.5 ~80.2% and DeepSeek V4-Pro ~80.6% are within ~0.2 pts of Claude Opus 4.6 on SWE-bench Verified).
Tools, Papers & Standards Mentioned
- Claude Code — Anthropic’s terminal coding agent (the “harness”).
- Ollama — local LLM runner; see the Claude Code integration and
ollama launch. - OpenRouter — model-routing API; free models, pricing, FAQ/rate limits.
- SWE-bench Verified — the coding benchmark behind the comparison charts.
- Model families referenced (open-weight): Qwen, GLM, MiniMax, Google Gemma — see open-source coding LLM landscape 2026.
ANTHROPIC_BASE_URL/ model env vars — the configuration surface the whole video depends on.
Follow-up Questions
- With the cheap-OpenRouter setup (Gemma 4 at ~$0.14/$0.40 per M tokens), what’s the real end-to-end cost of a typical coding session versus a Claude Pro/Max subscription — is “50–100× cheaper” accurate once tool-call overhead is included?
- Which specific local models clear Claude Code’s tool-calling + 64k-context bar on consumer hardware (16–24GB RAM/VRAM), and how much does quality degrade versus Ollama Cloud or OpenRouter equivalents?
- Since open models often lack native web search, what’s the cleanest way to wire Brave/Tavily/Perplexity MCP tools into a Claude-Code-on-open-model setup so research tasks still work?
Sources
- https://docs.ollama.com/integrations/claude-code
- https://ollama.com/blog/launch
- https://code.claude.com/docs/en/model-config
- https://code.claude.com/docs/
- https://imfing.com/til/use-custom-llm-providers-in-claude-code/
- https://openrouter.ai
- https://openrouter.ai/openrouter/free
- https://openrouter.ai/pricing
- https://openrouter.ai/docs/faq
- https://openrouter.zendesk.com/hc/en-us/articles/39501163636379-OpenRouter-Rate-Limits-What-You-Need-to-Know
- https://www.mindstudio.ai/blog/how-to-run-claude-code-free-ollama-open-router
- https://www.mindstudio.ai/blog/kimmy-k2-6-qwen-3-6-open-source-frontier-models
- https://www.mindstudio.ai/blog/best-open-source-llms-agentic-coding-2026
- https://iternal.ai/llm-selection-guide
- https://www.swebench.com/