Testing Qwen 3.6 Locally on Easy, Medium, and Hard Coding Tasks
Decision Card
Effort: A weekend setup — install Ollama or LM Studio, ollama pull qwen3.6 (the 27B dense build needs a 24 GB GPU/Mac; the 35B-A3B MoE runs in ~12 GB VRAM), point an editor like Zed at it, and feed it a single-file web app to see if local-only coding fits your workflow.
Honest take: The video’s headline finding — pick the smaller 27B dense model over the 35B MoE for coding — is real and benchmark-backed, but every test is a self-contained single-HTML-file toy (todo list, sorting visualizer, Kanban). That probes “can it one-shot a greenfield page,” not the “real limits” the author claims to be testing; multi-file repos, debugging existing code, and the quality-vs-speed tradeoff he glosses over are where local models actually struggle.
Concrete next steps:
- Install Ollama and pull the 27B dense variant; run the same three escalating tasks (todo → visualizer → Kanban) yourself (~1–2 hrs incl. a long Kanban run).
- Compare against the 35B-A3B MoE on your hardware to confirm the “3× faster but lower coding quality” tradeoff before committing (HF model card) (~30 min).
- Test on a real multi-file repo task, not a single HTML file, since that’s the gap the video never closes (~1 hr).
- Skip if you don’t have a discrete GPU with ample VRAM — the author’s whole setup hinges on offloading to a desktop with a big-VRAM card, and a CPU/integrated-GPU run will be painfully slow.
TL;DR
A creator benchmarks the locally-run Qwen 3.6 model on easy/medium/hard coding tasks and finds the 27B dense variant beats the larger 35B mixture-of-experts model on coding despite being smaller. Qwen 3.6 one-shots a todo app, a sorting visualizer, and a planned-out Kanban board with no errors, which he calls an outstanding result for a small local model.
Key Points
- The 35B and 27B Qwen 3.6 models differ in architecture, not just size — 35B is mixture-of-experts (MoE), 27B is traditional dense — so picking by parameter count alone is misleading. 00:30
- Surprisingly, the larger 35B MoE model runs ~3× faster on his machine than the smaller 27B dense model. 01:05
- The smaller 27B model scored better on software-development benchmarks, so he chose it for coding work. 01:14
- He frames local models as a complement to paid models (Opus, Gemini) for simpler tasks or private data — not a replacement. 01:44
- Hardware matters: he runs the model on a desktop with a high-VRAM discrete GPU and connects from a MacBook over the local network; VRAM is the single most important factor. 02:23
- Easy task (single-file todo web app): ~5 minutes to generate, worked correctly in-browser. 04:11
- Medium task (sorting visualizer with six algorithms, deliberately under-specified): ~20 minutes, nearly 1,000 lines, all functional. 05:42
- Hard task (Kanban board): he first had Qwen generate a detailed multi-phase plan, then implemented it one phase at a time over ~1 hour total — working on the first try. 07:06
- He warns that a constantly-running local model can noticeably raise your electricity bill — an overlooked cost. 05:27
Notable Quotes
“The larger model, the 35 billion version, runs almost three times faster on my machine than the smaller 27 billion model.” 01:05
“For me, local models are not a complete replacement for paid models. I see them as a complement to paid models for situations where I need to handle less complex tasks, or when I don’t want my data leaving my computer.” 01:44
“For a relatively small model running locally, this is an outstanding result. I did not expect the model to succeed on the first try without any errors.” 08:51
Verified Claims
- The 35B Qwen 3.6 is a mixture-of-experts model and the 27B is dense. 00:30 — The official model card confirms Qwen3.6-35B-A3B is MoE (35B total, ~3B activated, 256 experts), while the 27B is the dense variant. Qwen3.6-35B-A3B (Hugging Face), Qwen3.6-27B (Hugging Face) — Confirmed.
- The larger 35B MoE runs ~3× faster than the smaller 27B dense. 01:05 — Independent comparisons report the 35B-A3B is “3–4× faster in everything” because MoE activates only ~3B parameters per token; this is the expected dense-vs-MoE tradeoff. aimadetools: 27B vs 35B-A3B — Confirmed.
- The smaller 27B model performs better on software-development benchmarks. 01:14 — Benchmark tables show the 27B dense leading SWE-bench Verified (~75–77 vs ~73.4 for the MoE) and outperforming the MoE on most coding benchmarks. zoliben: 35B vs 27B benchmarks, Qwen3.6-35B-A3B card — Confirmed (with nuance: the MoE actually wins on Terminal-Bench 2.0, so “better on software dev” isn’t universal).
- VRAM is the most important factor for running the model locally. 02:39 — Setup guides consistently state the 27B dense needs ~24 GB VRAM (RTX 4090 / 24 GB Mac) while the MoE fits in ~12 GB, making VRAM the gating resource. codersera: Run Qwen 3.6 Locally — Confirmed.
- Qwen 3.6 is open-weight and runnable locally via standard tooling. 02:48 — Models are Apache 2.0 licensed and available through Ollama, llama.cpp, vLLM, and sglang. Ollama: qwen3.6, QwenLM/Qwen3.6 (GitHub) — Confirmed.
Tools, Papers & Standards Mentioned
- Qwen 3.6 (27B dense / 35B-A3B MoE) — GitHub, Hugging Face: Qwen3.6-27B, Hugging Face: Qwen3.6-35B-A3B
- Ollama (local model runtime referenced for local serving) — ollama.com/library/qwen3.6
- Zed Editor (“Z Editor” in the transcript — the editor he uses, configured to connect to a remote local model) — zed.dev
- Mixture-of-Experts / dense architecture background — Qwen3 Technical Report (arXiv)
Follow-up Questions
- How does the 27B dense model perform on multi-file, existing-codebase tasks (refactoring, debugging) rather than greenfield single-file generation — the scenario the video never tests?
- What is the real total-cost-of-ownership of local inference (hardware amortization + electricity) versus paying per-token for a frontier API for the same workload?
- Does the “plan first, then implement phase-by-phase” workflow he used on the Kanban board materially improve output quality, or would a single-shot prompt have succeeded too?