DeepSeek V4 lands: 1M context, mHC residuals, Muon

5 articles scored today

In plain English

DeepSeek released V4 today, and the technical report is unusually detailed. The headline: a 1.6 trillion parameter model (with only 49 billion active per query, thanks to a sparse "mixture-of-experts" design) that handles a 1 million token context window while using a fraction of the memory of its predecessor. They swapped the standard AdamW optimizer for Muon, introduced a new residual scheme called mHC to stabilize very deep networks, and replaced reinforcement learning post-training with on-policy distillation from specialist teacher models. Coding benchmarks reportedly beat GPT-5.4 and Claude Sonnet 4.5. Huawei Cloud shipped same-day Ascend support, and PPIO is serving the API. For the Chinese open-weights ecosystem, this is the most consequential release of the quarter.

Quiet week, then DeepSeek drops 484 days of architecture work in one report. Huawei and PPIO had adapters ready at launch.

DeepSeek V4 report: 484 days of architecture work, fully disclosed

QbitAI · zh · sebmeter 97

V4-Pro is 1.6T total / 49B active; V4-Flash is 284B / 13B. The interesting parts: hybrid compressed attention (CSA+HCA) cuts KV cache to 10% of V3.2 at 1M context, Manifold-Constrained Hyper-Connections (mHC) keep ultra-deep residual streams stable, and Muon replaces AdamW for most parameters, using RMSNorm rather than Kimi K2's QK-Clip to keep attention logits bounded. Post-training is on-policy distillation from domain experts (math, code, agent, instruction) instead of mixed RL. V4-Pro-Max hits Codeforces 3206, ranking 23 globally and edging GPT-5.4. V4-Flash-Max matches GPT-5.2 reasoning at 13B active. Ascend 950 supernode batch availability is flagged for H2 2026.

This is the architecture-depth release of the year so far: optimizer swap, residual redesign, attention rework, post-training rework, all in one model.

Read at QbitAI → https://www.qbitai.com/2026/04/406809.html

PPIO ships V4 preview with 1M context out of the box

QbitAI · zh · sebmeter 82

Concrete efficiency numbers worth pinning down: V4-Flash runs at 10% of V3.2's FLOPs and 7% of its KV cache at 1M context, via DeepSeek Sparse Attention (DSA) plus attention compression. PPIO is first to serve both variants via API. Agentic coding on V4-Pro reportedly beats Claude Sonnet 4.5.

The cost curve for long-context Chinese open-weights inference just bent again. PPIO being day-one matters for anyone benchmarking deployment.

Read at QbitAI → https://www.qbitai.com/2026/04/406802.html

V4 efficiency numbers and PPIO's acceleration engine

QbitAI · zh · sebmeter 72

V4-Pro at 1M context: 27% of V3.2's per-token FLOPs, 10% of its KV cache. V4-Flash drops further to 10% / 7%. PPIO claims 10x-plus inference cost reduction via their own acceleration stack on top. Agent coding is positioned between Sonnet 4.5 and Opus 4.6 non-thinking.

Second QbitAI angle on the same launch, but the FLOPs breakdown for V4-Pro specifically is useful if you're doing deployment math.

Read at QbitAI → https://www.qbitai.com/2026/04/406760.html

Huawei Cloud first to adapt V4 on Ascend

QbitAI · zh · sebmeter 72

Huawei Cloud shipped native 1M-context inference at launch via layered KV cache compression, 10-plus Ascend fused operators, async scheduling, and MTP speculative decoding. V4-Flash is one-click on the MaaS platform. Kingsoft Office and 360 are named as early enterprise users.

DeepSeek plus Ascend on day one is the sovereign-stack story playing out in real time. Worth tracking how the operator-level optimizations compare to NVIDIA paths.

Read at QbitAI → https://www.qbitai.com/2026/04/406791.html

Skim pile

A unified AI base for the auto industry · QbitAI · Volcano Engine's Doubao now in 100% of major Chinese OEMs and 700M-plus vehicles, moving from multi-agent patchwork to a single perception-reasoning-execution loop · 68

Manage subscription / unsubscribe: {UNSUB_URL}