Good morning. The DeepSeek V4 story is evolving in interesting directions — early benchmarkers are finding the Pro variant undertrained and token-hungry, while Flash quietly looks like the smarter pick. Elsewhere, Cohere and Aleph Alpha are merging to chase the “sovereign AI” niche, Xiaomi’s MiMo V2.5 Pro is getting unexpected love from local-model enthusiasts, and Anthropic ran a small but unsettling experiment in agent-on-agent commerce.
DeepSeek V4 Pro looks undertrained. Two days in, the verdict on DeepSeek V4 Pro is sharper than the launch-day enthusiasm suggested. A widely-shared Reddit post compares token usage on Artificial Analysis benchmarks: V4 Pro burns roughly 2.5x more tokens than GPT-5.5 High to reach similar scores, and hallucination rates are unusually high. The leading theory in the thread is that this release prioritized getting the model to run on Huawei Ascend chips over squeezing out final performance, with a V4.1 likely to follow. One commenter flagged that “intelligence density” conflates parameter density with reasoning density — for a 1.6T MoE, active params per token are what matter, and the headline size comparison is misleading.
Cohere absorbs Aleph Alpha at a $20B valuation. Cohere is acquiring Germany’s Aleph Alpha to pitch enterprises a non-American “sovereign AI” option, with Schwarz Group leading a Series E and putting up €500 million in structured financing, per TechCrunch. The combined valuation jumps to roughly $20 billion against about $240 million in combined ARR — and Aleph Alpha was contributing very little of that revenue. It’s a bet that regulated European industries will pay a premium for data residency and a non-US vendor stack, even as foundation models commoditize.
Xiaomi’s MiMo V2.5 Pro impresses, with caveats. Xiaomi’s new 1T-parameter model debuted at #54 on the Artificial Analysis Intelligence Index, with weights promised soon (Reddit thread). Several r/LocalLLaMA users say it tops Opus 4.6 on their own coding and agentic benchmarks, with one calling it the strongest writing model they’ve used regardless of origin. Skeptics in the thread point out that Xiaomi promised open weights for V2 Pro and Omni months ago and never delivered, so the “weights are coming” tag is doing a lot of work. At 1T parameters it’ll be cloud-only for most people anyway.
Anthropic ran an agent marketplace, and the results are awkward. Anthropic’s “Project Deal” let AI agents negotiate real transactions for 69 employees on an internal classified marketplace, completing 186 deals worth over $4,000 (TechCrunch). Two findings stood out: users with better underlying models got better outcomes, but couldn’t tell they were being outcompeted; and the initial instructions humans gave their agents barely affected results. The first is the more uncomfortable one — it suggests an “agent quality gap” where pricing tiers translate directly into negotiating power, invisibly.
A theory paper on transformer succinctness. A new arXiv paper proves transformers can represent formal languages exponentially more compactly than finite automata or LTL formulas. The flip side: verifying properties of transformers is EXPSPACE-complete, meaning formal verification is computationally intractable in the general case. A useful theoretical companion to yesterday’s “scientific theory of deep learning” paper, even if the practical implications point in the opposite direction.
Two political stories worth noting briefly. UBC researchers published in Science warning about coordinated AI persona swarms manufacturing consensus in online communities, with examples cited from elections in the US, Taiwan, Indonesia, and India. Reddit’s response was almost uniformly “this happened years ago, see Cambridge Analytica.” Separately, the White House has accused China of “industrial-scale theft” of US AI technology (US News) — a claim that landed flat on r/artificial, where commenters pointed to publicly distributed Claude-distilled models on Hugging Face and noted the irony of US labs that scraped the open web complaining about their outputs being scraped in turn.
That’s the briefing. Watch for a V4.1 checkpoint from DeepSeek in the coming weeks, and whether MiMo’s weights actually land this time.