DeepSeek V4 lands: 1M context, mHC residuals, Muon
Quiet week, then DeepSeek drops 484 days of architecture work in one report. Huawei and PPIO had adapters ready at launch.
DeepSeek V4 report: 484 days of architecture work, fully disclosed
V4-Pro is 1.6T total / 49B active; V4-Flash is 284B / 13B. The interesting parts: hybrid compressed attention (CSA+HCA) cuts KV cache to 10% of V3.2 at 1M context, Manifold-Constrained Hyper-Connections (mHC) keep ultra-deep residual streams stable, and Muon replaces AdamW for most parameters, using RMSNorm rather than Kimi K2's QK-Clip to keep attention logits bounded. Post-training is on-policy distillation from domain experts (math, code, agent, instruction) instead of mixed RL. V4-Pro-Max hits Codeforces 3206, ranking 23 globally and edging GPT-5.4. V4-Flash-Max matches GPT-5.2 reasoning at 13B active. Ascend 950 supernode batch availability is flagged for H2 2026.
This is the architecture-depth release of the year so far: optimizer swap, residual redesign, attention rework, post-training rework, all in one model.
Read at QbitAI → https://www.qbitai.com/2026/04/406809.html
PPIO ships V4 preview with 1M context out of the box
Concrete efficiency numbers worth pinning down: V4-Flash runs at 10% of V3.2's FLOPs and 7% of its KV cache at 1M context, via DeepSeek Sparse Attention (DSA) plus attention compression. PPIO is first to serve both variants via API. Agentic coding on V4-Pro reportedly beats Claude Sonnet 4.5.
The cost curve for long-context Chinese open-weights inference just bent again. PPIO being day-one matters for anyone benchmarking deployment.
Read at QbitAI → https://www.qbitai.com/2026/04/406802.html
V4 efficiency numbers and PPIO's acceleration engine
V4-Pro at 1M context: 27% of V3.2's per-token FLOPs, 10% of its KV cache. V4-Flash drops further to 10% / 7%. PPIO claims 10x-plus inference cost reduction via their own acceleration stack on top. Agent coding is positioned between Sonnet 4.5 and Opus 4.6 non-thinking.
Second QbitAI angle on the same launch, but the FLOPs breakdown for V4-Pro specifically is useful if you're doing deployment math.
Read at QbitAI → https://www.qbitai.com/2026/04/406760.html
Huawei Cloud first to adapt V4 on Ascend
Huawei Cloud shipped native 1M-context inference at launch via layered KV cache compression, 10-plus Ascend fused operators, async scheduling, and MTP speculative decoding. V4-Flash is one-click on the MaaS platform. Kingsoft Office and 360 are named as early enterprise users.
DeepSeek plus Ascend on day one is the sovereign-stack story playing out in real time. Worth tracking how the operator-level optimizations compare to NVIDIA paths.
Read at QbitAI → https://www.qbitai.com/2026/04/406791.html
Skim pile
- A unified AI base for the auto industry · QbitAI · Volcano Engine's Doubao now in 100% of major Chinese OEMs and 700M-plus vehicles, moving from multi-agent patchwork to a single perception-reasoning-execution loop · 68
Manage subscription / unsubscribe: {UNSUB_URL}