Testing Qwen 3.6 Locally on Easy, Medium, and Hard Coding Tasks

Zero to MVP · 9m 19s · Watch on YouTube · 10 sources

Decision Card

Effort: One evening (2–4 hours) if you already own a 24GB-class GPU — pull the Qwen3.6-27B Q4 GGUF via Ollama or llama.cpp (~17GB VRAM), point Zed’s local-model provider at the endpoint, and rerun one of his three test tiers yourself.

Honest take: The video’s “success” bar is clicking around single-file HTML apps in a browser — no tests, no multi-file repos, no existing codebase — so it demonstrates greenfield toy-app generation, not the agentic SWE-bench-style work the 27B model is actually benchmarked on; and the author accepts a ~3x speed penalty (5 min for a to-do list, 1 hour for a Kanban board) that independent comparisons say makes the 35B-A3B MoE the better daily-driver default for interactive coding.

Concrete next steps:

Check the official benchmark deltas before picking a variant — 27B dense vs 35B-A3B on SWE-bench Verified and Terminal-Bench at github.com/QwenLM/Qwen3.6 (~15 min).
If you have a 24GB GPU, run 35B-A3B at Q4 first (~80–120 t/s vs ~25–35 t/s for the dense model per this comparison) and only fall back to 27B if output quality disappoints (~1 hour).
Wire it into Zed following zed.dev/docs/ai/use-a-local-model, including the remote-machine setup the video uses (~30 min).
Skip if you don’t have a ≥16GB-VRAM GPU (or ≥24GB unified-memory Mac), or if your work is multi-file changes in existing codebases — this video provides no evidence for that use case.

TL;DR

The creator tests Qwen 3.6 27B (dense) locally in Zed on three coding tasks — a to-do app (5 min, pass), a sorting-algorithm visualizer (20 min, pass), and a plan-then-implement Kanban board (~1 hour, pass first try) — and concludes it’s good enough to use regularly as a complement to paid models. The surprising setup finding: the larger 35B MoE variant runs ~3x faster on his hardware, but the smaller 27B dense model benchmarks better on software-development tasks, so he picks the slower one.

Key Points

Qwen 3.6 ships in two variants that differ in architecture, not just size: 35B mixture-of-experts vs 27B dense, so picking “the biggest number” is the wrong heuristic. 00:19
On his machine the 35B MoE runs almost 3x faster than the 27B dense model, because MoE activates only a subset of expert sub-networks per token. 01:05
Per the creators’ published benchmarks, the smaller 27B dense model beats the 35B MoE on software-development tasks, which is why he standardizes on it. 01:14
He frames local models as a complement to paid models — for simpler tasks and for data that must not leave the machine — not a replacement for Opus or Gemini. 01:44
Setup: model runs on a desktop PC with a dedicated GPU, accessed over the local network from a MacBook via Zed editor; VRAM capacity is called the single most important hardware factor. 02:29
Under load the model fits entirely in VRAM with headroom for a large context window; GPU is fully utilized while CPU and system RAM stay nearly idle. 03:41
Easy task (single-file HTML to-do app): ~5 minutes, fully working on first attempt. 04:11
Medium task (six-algorithm sorting visualizer from a deliberately vague spec): ~20 minutes, ~1,000 lines in one file, everything matched the spec. 05:43
Hard task (Kanban board): he first had the model write a phased implementation plan (~5 min), then fed it one phase at a time; all six phases took ~1 hour total and worked first try. 06:52
He flags electricity cost as an overlooked expense of running a local model near-constantly. 05:32

Notable Quotes

“The real question is more like this: How complex of a task can I give Qwen 3.6 and still be confident that it will handle it successfully?” 02:02

“The larger model, the 35 billion version, runs almost three times faster on my machine than the smaller 27 billion model.” 01:05

“For a relatively small model running locally, this is an outstanding result. I did not expect the model to succeed on the first try without any errors.” 08:51

Verified Claims

Qwen 3.6 exists in a 35B mixture-of-experts variant and a 27B dense variant. 00:19 — Sources: QwenLM/Qwen3.6 on GitHub (Qwen3.6-35B-A3B released April 16, 2026; Qwen3.6-27B dense released April 22, 2026, both Apache 2.0, ~262K context), Qwen blog: Qwen3.6-27B. Verdict: Confirmed.
The 35B MoE runs roughly 3x faster than the 27B dense model on the same hardware. 01:05 — Sources: aimadetools dense-vs-MoE comparison reports ~80–120 t/s (MoE) vs ~25–35 t/s (dense) on an RTX 4090 at Q4 — a 3–5x gap — because the MoE activates only ~3B of its 35B parameters per token. Verdict: Confirmed.
The smaller 27B dense model outperforms the 35B MoE on software-development benchmarks. 01:14 — Sources: Qwen blog: Qwen3.6-27B, aimadetools comparison (SWE-bench Verified 77.2% vs 73.4%; Terminal-Bench 59.3% vs 48.2%). Verdict: Confirmed.
MoE means specialized expert sub-networks with only some activated per token; dense means all parameters active for every token. 00:30 — Sources: Qwen blog: Qwen3.6-35B-A3B (the “A3B” suffix denotes ~3B active parameters), codersera local-run guide. Verdict: Confirmed.
GPU VRAM is the most important hardware factor for running these models locally. 02:42 — Sources: codersera guide (27B needs ~17GB at Q4_K_M, 35B-A3B ~19–22GB; a single 24GB GPU runs either, and long context inflates memory further). Verdict: Confirmed.
Zed editor can connect to local models running on another machine on the local network. 02:50 — Sources: Zed docs: Use a Local Model, Zed docs: LLM Providers (Ollama and OpenAI-compatible endpoints, including remote URLs). Verdict: Confirmed.
Both variants fit in a consumer GPU’s VRAM with room left for a large context window. 03:41 — Sources: codersera guide — true at Q4 on a 24GB card with modest context; the video doesn’t state its quantization, and long-context use can exceed headroom. Verdict: Inconclusive (plausible but under-specified in the video).

Tools, Papers & Standards Mentioned

Qwen 3.6 (27B dense, 35B-A3B MoE) — github.com/QwenLM/Qwen3.6; official announcements: Qwen3.6-27B, Qwen3.6-35B-A3B
Zed editor (local/agent AI) — zed.dev/docs/ai/use-a-local-model, zed.dev/docs/ai/llm-providers
Ollama (Zed integration, common local runtime) — docs.ollama.com/integrations/zed
Claude Opus (mentioned as paid-model comparison) — anthropic.com/claude
Gemini (mentioned as paid-model comparison) — gemini.google.com

Follow-up Questions

How does Qwen3.6-27B perform on existing multi-file codebases with agentic edit loops (the SWE-bench-style scenario), rather than single-file greenfield HTML apps — and does the 1-hour Kanban time balloon further there?
At what quantization level (Q4/Q5/Q6) does the 27B’s coding-quality edge over the 35B-A3B erode — is a Q6 MoE better than a Q4 dense model on the same 24GB card?
What is the real cost crossover — electricity plus hardware depreciation for near-constant local inference vs API pricing for a comparable-quality cloud model?