Cohere’s North Mini Code landed on June 9, 2026, with a straightforward pitch to developers: a 30 billion parameter agentic coding agent, Apache 2.0 licensed, that fits on a single NVIDIA H100. Independent testing by Artificial Analysis ranks it 8th of 127 comparable open-weight models on output speed at 210 tokens per second, with a time to first token of 0.25 seconds against a class median of 1.95 seconds. The catch — and it’s a real one — is verbosity: the model generated 75 million output tokens in Artificial Analysis benchmarking against a class median of 25 million. That’s three times the output of comparable models. For developers deciding whether to route agentic coding workloads through a managed API or a self-hosted open-source model, this is the clearest open-source contender to emerge in 2026.
What North Mini Code Is
North Mini Code is not a general-purpose model fine-tuned on code. Cohere built it from scratch for agentic software engineering — architecture mapping, multi-file code review, terminal-based tasks, and sub-agent orchestration. The architecture is a sparse mixture-of-experts Transformer with 128 experts and 8 activated per token, giving it the effective compute profile of a 3B dense model at inference time despite 30B total parameters. Cohere co-founder Nick Frosst demoed it running on a Mac Studio via MLX at around 20 gigabytes of RAM — the same machine he uses for his own local coding work.
Key Specs
- 30B MoE model, 128 experts, 8 active per token (3B active parameters at inference)
- 256,000 token context window; 64,000 token maximum generation length
- Apache 2.0 license
- Available on Hugging Face in BF16 and FP8 quantized weights, and via the Cohere API
- Integrated into OpenCode, SWE-Agent, and mini-SWE-Agent scaffolds
- Native Transformers support (transformers >= 5.9.0); vLLM and SGLang paths in active development
How Cohere Trained It
The training pipeline is the most interesting part of this release. Most coding models fine-tune a general-purpose base on code data. Cohere ran a two-stage supervised fine-tuning process followed by reinforcement learning with verifiable rewards (RLVR) across three distinct agent scaffolds simultaneously:
- SWE-Agent — a rich CLI with specialized commands (bash, str_replace_editor, submit)
- mini-SWE-Agent — a single bash tool with raw shell output, no scaffolding
- OpenCode — individually typed tools returning structured JSON
Training across all three rather than optimizing for one scaffold gained 10 percentage points on OpenCode evaluation while maintaining SWE-Agent numbers. That cross-harness robustness matters in production — agents encounter diverse tooling environments, and models that were benchmarked on one scaffold tend to degrade when you swap to another.
The training dataset covered over 70,000 verifiable tasks across approximately 5,000 real-world repositories, deduplicated against SWE-Bench and SWE-Bench Pro to prevent evaluation leakage. RLVR improved Terminal-Bench v2 performance by 7.9 percentage points and SWE-Bench Verified by 3.0 percentage points over the SFT-only checkpoint. In internal human pairwise evaluation across 85 samples, the final model won 66.1% of comparisons against its SFT-only counterpart on code editing tasks.
Benchmark Numbers and the Verbosity Warning
On the Artificial Analysis Coding Index, North Mini Code scores 33.4, outperforming Qwen3.5 35B, Gemma 4 26B, Devstral Small 2 (24B dense), and models substantially larger including Nemotron 3 Super (120B) and Devstral 2 (123B). Against Devstral Small 2 specifically, Cohere reports 2.8x higher output throughput and 30% lower inter-token latency under identical hardware conditions in internal testing.
The throughput advantage is real. The verbosity issue is equally real, and it’s what Artificial Analysis surfaced independently: 75 million output tokens to complete the Intelligence Index against a class median of 25 million. Three times the output tokens means three times the inference cost in high-volume pipelines. Benchmark pass rates do not surface this. “Suddenly people are thinking like hey, am I getting enough economic value out of the tokens from a model?” Frosst said during the launch video. Run your actual workload volume before committing to self-hosted inference on this model.
North Mini Code vs Claude Fable 5 vs Devstral Small 2
| Feature | North Mini Code | Claude Fable 5 | Devstral Small 2 |
|---|---|---|---|
| Parameters | 30B MoE (3B active) | Proprietary (undisclosed) | 24B dense |
| License | Apache 2.0 | Proprietary | Apache 2.0 |
| Deployment | Self-hosted, single H100 | Managed API only | Self-hosted |
| Output pricing | Infrastructure cost only | $50 per million tokens | Infrastructure cost only |
| Context window | 256K tokens | Undisclosed | 32K tokens |
| Multi-harness agentic training | Yes — 3 scaffolds (SWE-Agent, mini-SWE, OpenCode) | Undisclosed | Yes |
| Output verbosity vs median | 3x class median | N/A (managed) | Baseline |
| Output speed (Artificial Analysis) | 210 tok/s — 8th of 127 | N/A (managed) | Slower |
What This Means for Your Agentic Stack
Three decisions this release forces for teams building production agentic coding pipelines:
- Purpose-built agentic training is now the evaluation baseline. Any model claiming agentic coding capability should be able to tell you whether its training used verifiable agentic tasks across multiple harnesses, or was adapted from a general base. North Mini Code makes that a harder question to dodge.
- Verbosity is a hidden pipeline cost that benchmark tables skip. Three times the median output token count compounds into real latency and compute cost at scale. Test your actual workload volume against both a self-hosted path and your current managed API before switching.
- Data sovereignty now has benchmark-backed options. For teams handling proprietary codebases or operating in regulated environments, a self-hosted MoE model on a single H100 that outperforms 120B+ open-weight models is a concrete architectural option. That wasn’t true a year ago.
For context on how North Mini Code fits into the broader coding assistant landscape, the best AI coding assistants ranked for 2026 covers the full field including managed and self-hosted options. For a head-to-head on managed tools, the Claude Code vs GitHub Copilot comparison is the most data-driven breakdown available. Developers building multi-model agentic setups can test open-weight model backends directly via Bind AI’s IDE.
The Bottom Line
North Mini Code is the most technically credible open-source coding agent released in 2026. It outperforms models four times its size on the Artificial Analysis Coding Index, runs on a single H100, and carries an Apache 2.0 license with no usage restrictions. The verbosity problem is genuine — three times the median output token count will compound into real cost and latency at production scale. Before swapping your managed API calls for self-hosted inference, run this model against your actual workload volume. The weights are on Hugging Face now. That’s the right starting point. Frosst’s framing — “small, open source, transparent and sovereign vs large, expensive, proprietary and hegemonic” — is provocative, but the benchmark data gives it something to stand on.