DeepSeek V4 forced Anthropic to cut prices for the first time.

12 articles scored today

DeepSeek V4 — one week of production telemetry

Recode China AI · EN · Kir-News 89

Tencent's QClaw agent platform reports the V4 cache-pricing cut held up under real workloads: agent coding spend down 78% week-over-week against the GPT-5.4 baseline, with cache hit rate at 92% on multi-turn tasks. Alibaba's ACS Agent Sandboxing service moved its default model from Qwen3.5-Plus to V4 for new tenants. ByteDance's Volcano Engine started routing internal Doubao agent jobs to a private V4 deployment, with Huawei Ascend behind the API. Inference cost per 1M output tokens is now $0.14 on cached prompts — under one-tenth of Claude Sonnet 4.6.

First production telemetry against the V4 launch. The price floor isn't theoretical anymore.

Cambricon Siyuan 690 leaks — V4-tuned inference at 70% Hopper throughput

Synced Review · EN · Kir-News 84

Specs leaked over the weekend on a Cambricon developer forum, since deleted: Siyuan 690 hits ~600 TFLOPS FP8, 192GB HBM3, NVLink-equivalent interconnect at 1.4TB/s. Internal benchmarks reportedly land V4-Pro inference at 70% of Hopper throughput at roughly half the wall-clock cost. The chip targets the gap Ascend can't fill: dense compute for the labs that didn't get early V4 access. Cambricon's order book is reportedly oversubscribed through Q3 2026, with Tencent and ByteDance taking volume positions.

If Cambricon delivers, V4 inference economics get even better and the supply pressure on Nvidia eases on the Chinese side.

Claude Sonnet 4.7 ships — reliability fixes and a 35% cache-token cut

Import AI · EN · Kir-News 81

Anthropic released Sonnet 4.7 Sunday evening, US time. Three things stand out. First, the bug fixes from the two-month Claude Code saga are explicitly called out in the changelog. Second — and this is new — Anthropic cut cache-token pricing 35% across the line, the first material price cut of the year. Third, agent throughput on internal benchmarks (SWE-Bench Verified, BrowseComp) jumped meaningfully without a parameter increase. Pricing now lands within 2× of V4 on cached coding workloads, where it was 5–6× last week.

First evidence that Western labs are responding to Chinese price pressure on the API rather than capability narratives.

Verifier-swap RL: 7× compute efficiency from a single training trick

ChinAI Newsletter · EN · Kir-News 76

A joint Tsinghua / Shanghai AI Lab paper proposes rotating verifier models mid-training rather than fixing one. The trick: as the policy improves, the verifier becomes the bottleneck — swap to a stronger verifier every N steps and the reward signal stays informative. On a Qwen3-32B base, the team replicates DeepSeek-R1-Zero performance on AIME and LiveCodeBench using one-seventh the RL compute of vanilla GRPO. The verifiers themselves are open-weights at three capability tiers. Reproducible from a public recipe.

Cheap RL efficiency wins keep landing. Worth reading the recipe — most teams don't rotate verifiers because nobody told them to.

Send a short note about your brand, your markets, and your stack — we’ll come back with whether this fits and what version of the system makes sense. First call is a scoping conversation, not a pitch.

Kiri Media AB
Kungstensgatan 27
113 57 Stockholm
Sweden
Contact
sebastian@kirimedia.co +46 8 000 00 00
Explore
Meta Ads agency TikTok Ads agency Snapchat Ads agency Google Ads agency SEO agency AI marketing Guides