An r/LocalLLaMA post last week hit close to 1,000 upvotes for a niche claim: Qwen 3.6 27B, running on a Mac with a technique called MTP, hit 28 tokens per second. The author called it "finally a viable option for local agentic coding."
Translated: a model you run on your laptop is fast enough to power a real coding assistant. Like Claude Code, but offline, free per token, with a 262k context window. For Malaysian dev teams handling sensitive client code, this is the moment local AI coding stopped being a science project.
Can you really run Claude Code with a local LLM?
Three things converged.
Qwen 3.6 27B is Alibaba's open-weight dense coding model, 27.8B parameters, trained for the multi-step tool-calling pattern that powers Cursor and Cline.
MTP (Multi-Token Prediction) lets the model predict several tokens ahead (Qwen 3.6 uses 5 heads, 3 verified per pass) and check them in one shot. Code is patterned, so most predictions land: 2.5x speedup at 83% acceptance, no measured quality loss.
Drop-in API compatibility. Builds expose OpenAI and Anthropic-compatible endpoints. Set ANTHROPIC_BASE_URL=http://127.0.0.1:8081 and Claude Code routes through your local model.
The speed barrier just broke
The speed barrier for local agentic coding just broke. Image: Gotchaa Lab
Local coding models have been stuck around 10 to 15 tokens per second. Agentic coding is brutal: the AI makes 20 to 30 tool calls per task, each waiting for a full response. At 12 tok/s, a single bug fix means minutes of spinner-staring.
28 tok/s is where the experience starts feeling like Cursor. Not faster than the cloud, but fast enough to stop being annoying. The 262k context window matters as much. At 48GB of unified RAM with the Q5_K_M quant, Qwen 3.6 27B fits roughly 200,000 lines of code. A real Laravel monolith, not a demo.
The honest catch
Local AI coding is usable now. It is not yet Claude Opus 4.7.
On benchmarks, Qwen 3.6 27B is closer to frontier than people expect: 77.2% on SWE-bench Verified (Claude Opus 4.6 sits at 80.8%) and 59.3% on Terminal-Bench 2.0, matching Claude 4.5 Opus. For boilerplate, refactoring, tests, and CRUD, you will not feel the gap.
The gap shows up on judgment work: unfamiliar codebases, tricky concurrency, hard architectural calls. Frontier Claude still wins those. The Reddit poster was also blunt: Q8_0 is the real sweet spot, and the popular claim that 6-bit is "almost as good as 8-bit" is, in their testing, not true.
Why this matters for Malaysian teams
Two reasons that map to real client conversations.
PDPA and client-confidential code. Banks, GLCs, and healthcare clients increasingly write code-sharing prohibitions into vendor contracts. Sending source code to OpenAI or Anthropic is a cross-border data transfer under the PDPA 2024 amendment. Your options were a weaker local model, on-prem GPUs, or declining the engagement. A Mac Studio with 96GB unified memory now sits around RM20,000 to RM25,000. For agencies with two or three engineers on sensitive work, that pays for itself fast.
API cost math changes. Anthropic recently doubled its estimate of Claude Code spend to roughly USD 13 per developer per day, heavier users USD 20+. For a five-person KL team, that is roughly RM13,000 to RM20,000 per month. A RM20,000 to RM25,000 Mac Studio pays back in one to two months, with zero rate limits.
Same two-tier strategy from our AI race post: local for volume work, frontier for judgment-heavy decisions worth the data residency trade-off.
Is local AI coding good enough to replace Claude Code?
Not yet, for most teams. The setup needs comfort with command-line tools, llama.cpp compilation, and patience for quirks. The four chat template fixes the maintainer ships are required for tool-calling to work in C++ runtimes. If your idea of "AI coding setup" is a VSCode extension, this is not yet for you.
Within 12 months, expect this stack to be one-click installable via LM Studio or Ollama. The pragmatic move now: keep paying for Claude Code, but track your AI spend and ask which clients value data-residency. When the next Qwen ships, the agencies ready with local-first delivery will move first.
Our take
Tools like Qwen 3.6 27B make the mechanical part of software faster and cheaper. What they do not handle is judgment: which architecture survives three years, how it fits an existing LHDN integration, what the PDPA exposure really is, who is accountable when it breaks at 2am.
The mechanical part is getting commoditised. The judgment is not.
Thinking about AI coding under PDPA constraints, or about custom software built local-first? Let's chat. No sales pitch.




