Good morning. Anthropic’s reported $1.2 trillion valuation and a SpaceX Colossus deal are setting today’s dollar-figure tone, but the more interesting action is happening at the inference layer — three separate projects squeezing more tokens out of the same hardware. The Musk-Altman trial also keeps producing testimony with an unusually high gossip-to-legal-substance ratio.
Anthropic, $1.2T, and SpaceX’s Colossus 1. Headlines are claiming Anthropic has secured access to SpaceX’s Colossus 1 cluster (6,144 H100s) at a $1.2 trillion valuation with “80x growth,” though the Reddit thread immediately picks the framing apart — that’s an annualized rate over a couple of months, meaning roughly 2x actual growth. Commenters were more interested in the Elon angle: renting his rockets-adjacent compute to a competitor reads like a quiet admission that xAI hasn’t kept pace with labs spending less. Treat the trillion-dollar number with the usual late-cycle skepticism.
Musk v. Altman, week two. Greg Brockman testified that Musk himself pushed for OpenAI’s for-profit conversion and demanded “absolute control,” reframing the lawsuit as competitive retaliation tied to xAI. The more colorful detail came from Shivon Zilis, who revealed Musk had tried to recruit Altman to run an AI lab inside Tesla. MIT Technology Review’s recap notes the stakes: OpenAI’s path to a ~$1T IPO and xAI/SpaceX’s reported $1.75T target both hinge on what the court does next.
Cloudflare cuts 1,100 jobs while posting record revenue. Cloudflare laid off about 20% of its workforce — its first mass cut in 16 years — alongside record quarterly revenue of $639.8M, up 34% YoY. CEO Matthew Prince told staff the cuts weren’t about cost savings but about AI replacing roles across every team except quota-carrying sales, per TechCrunch. It’s the same record-revenue-plus-AI-layoffs pattern Meta, Microsoft, and Amazon have run this year.
Three ways to make Gemma and DeepSeek go faster. Multi-Token Prediction landed in llama.cpp, claiming a 40% speedup on Gemma 4 inference; commenters are asking the right question — show us temp-0 fixed-seed runs to confirm output parity. Separately, z-lab released gemma-4-26B-A4B-it-DFlash, with one user clocking 600 tok/s on an RTX 5090; the catch is that DFlash speeds up generation but slows prompt processing, the opposite tradeoff from MTP. And antirez published DS4, a single-purpose C inference engine targeting DeepSeek V4 Flash on 128GB MacBooks — disk-persistent KV cache, 1M context, 2-bit quant, and ~35 tok/s generation at 300 tok/s prefill on an M5 Max. The author openly notes much of it was written with GPT.
AI2’s EMO and a vLLM-on-ROCm release. Allen AI dropped EMO, a mixture-of-experts model where modular structure emerges during pretraining rather than being assigned by domain — only 12.5% of experts need to activate per task while keeping near-full performance. It’s a 1T-token experimental run, not a finalized release, and the usual AI2 complaint applies: please ship GGUFs this time. Separately, Lemonade added vLLM ROCm as an experimental backend, with a standalone portable executable on GitHub — Strix Halo, W7800, and AI Max 395+ owners are the obvious audience.
AI is breaking vulnerability disclosure norms. Jeff Kaufman argues that LLMs scanning public commits are eroding both coordinated disclosure and Linux’s “bugs are bugs” approach, citing Copy Fail — where an independent researcher rediscovered and disclosed the same flaw nine hours after the original reporter, blowing the embargo. HN commenters note this isn’t strictly new (people diffed kernel commits long before LLMs) but agree the cost has collapsed. The likely outcomes: shorter embargoes, a structural advantage for closed-source projects, or both.
That’s the morning. Watch the Murati and Sutskever testimony later this week — that’s where the OpenAI trial gets genuinely uncomfortable.