AI News — April 25, 2026

Good morning. The DeepSeek V4 reaction wave is in full swing, OpenAI’s GPT-5.5 is now live in the API with eyebrow-raising pricing at long context, Google is reportedly preparing a $40B follow-on into Anthropic, and Anthropic itself just confirmed what Claude Code users have been muttering about for weeks: yes, the model really did get worse. Plus a paper arguing that deep learning is finally ready for a real scientific theory.

Google’s $40B Anthropic check, and what it implies. Bloomberg reports Google is preparing to invest up to $40 billion in Anthropic, on top of the multi-gigawatt TPU deal signed weeks ago and the Amazon arrangement before that. The HN thread reads the subtext clearly: Anthropic was running into severe capacity constraints and signed back-to-back compute deals with two competing hyperscalers to dig out. One commenter framed Anthropic as the industry’s collective insurance policy — every hyperscaler wants a survival stake in case their own models lose the race. Another less generous read: this is the same circular money flow that’s making people reach for dot-com comparisons.

GPT-5.5 hits the API, and the long-context pricing stings. A day after the ChatGPT/Codex rollout we covered yesterday, OpenAI pushed GPT-5.5 and 5.5 Pro to the API. Pricing tiers at 272K context: $5/M input and $30/M output below, $10/$45 above — meaningfully more expensive than Claude Opus 4.7 once you cross the threshold. Reactions on HN are split between developers reporting genuinely better Codex behavior (one called it the second model to hit 25/25 on a private SQL benchmark, after Opus) and skeptics pointing to an 86% hallucination rate on artificialanalysis.ai versus Opus’s 36%. There’s also a recurring accusation in the thread that OpenAI-friendly accounts are flooding comments to downplay Claude — make of that what you will.

DeepSeek V4 gets the analysis treatment. With V4 out in the wild, the explainers are landing. MIT Tech Review’s three reasons it matters emphasizes the open-source pricing pressure and frontier-access expansion, while TechCrunch notes V4 Pro is now the largest open-weight model at 1.6T parameters (49B active), trailing GPT-5.4 and Gemini 3.1 Pro on knowledge benchmarks by an estimated 3-6 months. The Verge highlights the explicit Huawei chip compatibility — a meaningful signal for China’s domestic silicon push, even as US officials accuse DeepSeek of using banned Nvidia chips and Anthropic alleges Claude was misused in training.

The DeepSeek meme cycle is also in full swing. r/LocalLLaMA is having fun with the model: one post shows V4 nailing a classic lateral-thinking puzzle, with commenters quickly pointing out the question is almost certainly in training data and therefore useless as a reasoning test. Another thread jokes about V4 reaching “AGI” by correctly handling an orange-splitting puzzle. The serious takeaway buried in the jokes: community-favorite reasoning benchmarks have a shelf life, and most of them have expired.

Anthropic confirms Claude Code really did get dumber. Anthropic published an April 23 postmortem acknowledging three separate changes between March and April that degraded Claude Code: a reasoning effort downgrade, a memory-clearing bug, and a verbosity-reduction system prompt that hurt coding output. All three have been reverted as of April 20, and affected subscribers are getting usage-limit resets. The r/LocalLLaMA thread was flaired “Misleading” by mods for its hyperbolic framing, but the underlying frustration is fair: one commenter put it bluntly — businesses being told to fire staff and replace them with Claude agents potentially running at “cognitive 50%” without disclosure is a real SLA problem. The case for self-hosting, or at minimum mandatory release notes, is getting harder to argue against.

A theory of deep learning, maybe. A 41-page paper from 14 researchers argues that “learning mechanics” — a scientific theory of how neural networks train and generalize — is genuinely emerging, organized around five research strands including tractable mathematical limits and universal scaling behaviors. The framing is explicitly analogous to classical mechanics: falsifiable, quantitative, predictive. HN reactions are mixed; one practitioner praised the open-problems section as a solid map of the field, while others called the title oversold for what’s still a nascent program. The pragmatic case for caring: a real theory would let us predict failure modes and detect hallucination mechanically, rather than benchmark-by-benchmark.

That’s the morning. The DeepSeek V4 vs GPT-5.5 vs Opus 4.7 head-to-heads will probably dominate the next few days of benchmark Twitter — and somewhere in there, someone will quietly run them all locally and not care.