DeepSeek V4 lands: 1M context, mHC residuals, Muon

5 articles scored today

Quiet week, then DeepSeek drops 484 days of architecture work in one report. Huawei and PPIO had adapters ready at launch.

DeepSeek V4 report: 484 days of architecture work, fully disclosed

QbitAI · zh · sebmeter 97

V4-Pro is 1.6T total / 49B active; V4-Flash is 284B / 13B. The interesting parts: hybrid compressed attention (CSA+HCA) cuts KV cache to 10% of V3.2 at 1M context, Manifold-Constrained Hyper-Connections (mHC) keep ultra-deep residual streams stable, and Muon replaces AdamW for most parameters, using RMSNorm rather than Kimi K2's QK-Clip to keep attention logits bounded. Post-training is on-policy distillation from domain experts (math, code, agent, instruction) instead of mixed RL. V4-Pro-Max hits Codeforces 3206, ranking 23 globally and edging GPT-5.4. V4-Flash-Max matches GPT-5.2 reasoning at 13B active. Ascend 950 supernode batch availability is flagged for H2 2026.

This is the architecture-depth release of the year so far: optimizer swap, residual redesign, attention rework, post-training rework, all in one model.

Read at QbitAI → https://www.qbitai.com/2026/04/406809.html

PPIO ships V4 preview with 1M context out of the box

QbitAI · zh · sebmeter 82

Concrete efficiency numbers worth pinning down: V4-Flash runs at 10% of V3.2's FLOPs and 7% of its KV cache at 1M context, via DeepSeek Sparse Attention (DSA) plus attention compression. PPIO is first to serve both variants via API. Agentic coding on V4-Pro reportedly beats Claude Sonnet 4.5.

The cost curve for long-context Chinese open-weights inference just bent again. PPIO being day-one matters for anyone benchmarking deployment.

Read at QbitAI → https://www.qbitai.com/2026/04/406802.html

V4 efficiency numbers and PPIO's acceleration engine

QbitAI · zh · sebmeter 72

V4-Pro at 1M context: 27% of V3.2's per-token FLOPs, 10% of its KV cache. V4-Flash drops further to 10% / 7%. PPIO claims 10x-plus inference cost reduction via their own acceleration stack on top. Agent coding is positioned between Sonnet 4.5 and Opus 4.6 non-thinking.

Second QbitAI angle on the same launch, but the FLOPs breakdown for V4-Pro specifically is useful if you're doing deployment math.

Read at QbitAI → https://www.qbitai.com/2026/04/406760.html

Huawei Cloud first to adapt V4 on Ascend

QbitAI · zh · sebmeter 72

Huawei Cloud shipped native 1M-context inference at launch via layered KV cache compression, 10-plus Ascend fused operators, async scheduling, and MTP speculative decoding. V4-Flash is one-click on the MaaS platform. Kingsoft Office and 360 are named as early enterprise users.

DeepSeek plus Ascend on day one is the sovereign-stack story playing out in real time. Worth tracking how the operator-level optimizations compare to NVIDIA paths.

Read at QbitAI → https://www.qbitai.com/2026/04/406791.html

Skim pile

  • A unified AI base for the auto industry · QbitAI · Volcano Engine's Doubao now in 100% of major Chinese OEMs and 700M-plus vehicles, moving from multi-agent patchwork to a single perception-reasoning-execution loop · 68

Manage subscription / unsubscribe: {UNSUB_URL}

Send a short note about your brand, your markets, and your stack — we’ll come back with whether this fits and what version of the system makes sense. First call is a scoping conversation, not a pitch.

Kiri Media AB
Kungstensgatan 27
113 57 Stockholm
Sweden
Contact
sebastian@kirimedia.co +46 8 000 00 00
Explore
SEO agency Google Ads agency Meta Ads agency AI marketing Guides