AI News — May 27, 2026: OpenRouter's $1.3B Bet on Model-Switching, Uber's Token Spend Faces Reckoning

Good morning. Today’s theme is the bill coming due — on AI spending, on talent mobility, and on the web’s traffic economy. Uber’s tokenmaxxing hangover got a second airing as the COO joined the CEO in saying the quiet part out loud, OpenRouter raised at a $1.3B valuation off the same multi-model demand that’s draining everyone else’s budgets, and the Pope weighed in on what builders owe the rest of us.

Uber’s COO joins the tokenmaxxing reckoning. A day after we covered Uber president Andrew Macdonald questioning the company’s $3.4B AI spend, Business Insider ran the COO making essentially the same case, and the r/artificial thread is harsher than the HN version was. The most-quoted line: tokenmaxxing is the new KLOC, a metric so easy to game that it stops measuring anything useful the moment someone starts tracking it. One commenter noted that even without an official mandate, engineers would build automations to inflate their token numbers just in case — Goodhart’s Law arriving on schedule.

OpenRouter doubles to $1.3B on multi-model demand. OpenRouter raised a $113M Series B led by Google’s CapitalG, TechCrunch reports, more than doubling last year’s $547M mark. The numbers behind it: 100 trillion tokens routed per month, 5x growth in six months, 8 million users, 400+ models on offer. The bet is that enterprises increasingly want to pick a different model for each task and not get locked into one lab — which is also, conveniently, exactly the behavior that makes individual AI bills hard to predict.

China restricts overseas travel for AI talent at Alibaba and DeepSeek. Beijing is extending travel approval requirements — previously reserved for nuclear scientists and SOE executives — to AI staff at private firms, IBTimes reports. The r/LocalLLaMA thread is split: several commenters point out that most governments restrict travel for people with sensitive clearances, while others argue this will discourage overseas Chinese AI researchers from returning. One contrarian read worth noting: if the worry is poaching or paid defection, locking talent in place may actually keep open-weights output flowing from Chinese labs.

Sundar Pichai concedes “Google Zero” is happening. In a Verge podcast with Nilay Patel, Pichai walked back his earlier dismissal of “Google Zero” — the idea that search would answer queries directly and stop sending traffic to the open web — and acknowledged publishers like Condé Nast are now planning around it as a baseline. He also confirmed he restructured Google’s leadership in response to ChatGPT, which is about as direct an admission as you’ll get from a sitting CEO that the company was caught flat-footed.

Pope Leo XIV publishes an AI encyclical. Magnifica Humanitas runs through AI governance, labor, misinformation, autonomous weapons, and transhumanism, with the central claim that technology “is never neutral, because it takes on the characteristics of those who devise, finance, regulate, and use it.” The full text drew a notably warm HN reception, including from a self-described atheist who called the Vatican’s takes on technology better than any government’s. Another commenter summarized the ethical framework as a Catholic restatement of Asimov’s Three Laws, which is either a compliment or not depending on which Asimov story you’ve read most recently.

Sleep for LLMs. A UMD/CMU paper on arXiv proposes periodic “sleep” phases where an LLM compresses accumulated context into fast weights inside state-space blocks and clears the KV cache, sidestepping attention’s quadratic cost. Longer sleep yields better performance on multi-hop graph retrieval and math problems where standard transformers stall out. HN commenters were quick to point at prior work — E2E-TTT, Letta’s sleep-time compute paper — and one suggested the natural next step is offline LoRA fine-tuning on compacted context, giving you a three-tier memory hierarchy of base weights, LoRA-encoded mid-term knowledge, and live KV cache.

Open weights corner. PrismML released Bonsai Image 4B, a 1-bit/ternary text-to-image diffusion transformer that runs in-browser on WebGPU — though the demo is buggy (Chrome leaks memory, Firefox falls back to slow CPU) and one commenter alleges it’s an undisclosed quantization of FLUX.2 Klein 4B with no credit to the original team. On the LLM side, llmfan46 dropped an uncensored Qwen3.5-35B-A3B with all 785 MTP tensors preserved, claiming 85% fewer refusals at 0.0487 KL divergence from base, and notably shipping in NVFP4 GGUF — a format even Unsloth isn’t producing yet.

That’s the morning. The Uber story is the one to keep watching — if a second exec is publicly questioning AI spend, expect the next earnings cycle to surface a lot more of them.