AI News — May 18, 2026: Claude Overtakes ChatGPT in ARR and DAUs, NIMBY Opposition Hits 70%

Good morning. The market story today is Anthropic finally overtaking OpenAI on the metrics that actually pay the bills, while local-inference benchmarks continue eating Reddit alive — this time with M5 Max versus DGX Spark numbers and a writeup arguing your fancy Apple Silicon is a worse deal than just paying OpenRouter. Plus 70% of Americans don’t want a data center anywhere near them, and a Qwen render is apparently haunting people’s dreams.

ChatGPT slips to second. For the first time in years, Claude has passed ChatGPT on a long list of business metrics — net new ARR, mobile downloads, daily active users, enterprise adoption — according to a roundup making the rounds on Reddit. The consensus diagnosis is that Anthropic built for agentic coding and developer workflows while OpenAI tried to disrupt search, shopping, and social all at once. One commenter put it bluntly: “They were so excited about their stunning lead, they forgot they were in a race.” Worth noting that Opus 4.7 has been a regression for many users, with some switching back to 4.6 — so the crown could move again.

70% of Americans say not in my backyard. A new Gallup poll has opposition to local AI data centers jumping from 47% late last year to roughly 70% now, with concerns spanning electricity, water, utility bills, and pollution, PC Guide reports. Local moratoriums are spreading. The comment section pushed back hard on the framing — data centers predate AI and Instagram scrolling burns several times more energy per session — but as one Redditor noted, swap “data center” for “factory” and you’d get the same 70%. People don’t want big ugly buildings nearby, regardless of what’s inside.

M5 versus everything else. A community benchmark comparing M5 Max, DGX Spark, Strix Halo, and RTX 6000 for local inference confirms what you’d expect: RTX 6000 wins when the model fits in VRAM, M5 holds steady when it overflows. The big asterisk is that prefill/cache numbers were omitted, which commenters flagged as a serious gap — that’s exactly where the Spark might claw back ground on long-context work. Pricing math is messy: M5 Max 128GB runs about $5,500, a single Spark $3,800, but some argue Sparks only make sense in pairs.

Apple Silicon costs more than OpenRouter, says one developer. A blog post by William Angel concludes local inference on an M5 Max runs roughly 3× more expensive per million tokens and 3-7× slower than OpenRouter, with hardware depreciation dwarfing electricity. HN commenters dismantled the analysis on several fronts: it expenses the entire laptop against inference, uses high-end power estimates, and — most damningly — only measures output tokens. Agentic coding workloads are dominated by input tokens, which are effectively free locally. As one commenter pointed out, frontier providers are also selling at a loss, so today’s pricing isn’t a fair baseline for a five-year hardware comparison.

More MTP, more numbers. Following yesterday’s llama.cpp MTP merge, a follow-up PR landed that avoids unnecessary logit copies during prompt decode, and a benchmark thread on Qwen3.6 shows wildly varying results — 77 vs 27-30 tok/s on 27B Q5 for some, near-zero gain on 35B for others. Strix Halo users are reporting +10-15 tok/s gains. The recurring question is whether MTP’s token-generation wins justify the prompt-processing hit, which on some setups is severe.

AI won’t fix your broken process. A blog post arguing that AI won’t speed up software delivery because coding was never the bottleneck — unclear requirements were — got broad agreement on HN, with several commenters noting coding is well under half the actual work on most projects. The dissenting view: the post only considers AI’s impact on development, ignoring its effect on ideation, documentation, legal, and deployment. The best analogy in the thread: AI is a machine gun replacing a pistol — more firepower, harder to aim.

Raschka on what’s actually changing in transformers. Sebastian Raschka published a technical writeup on recent architecture work covering KV sharing and per-layer embeddings in Gemma 4, compressed convolutional attention in ZAYA1-8B, layer-wise attention budgeting in Laguna XS.2, and mHC with compressed attention in DeepSeek V4. It’s a clean tour of how labs are squeezing long-context efficiency out of the transformer block, driven by reasoning and agent workloads. Refreshingly free of benchmark cherry-picking.

A WebGL face the community wants to adopt as a mascot. Someone asked Qwen3.5-122B to generate a photorealistic real-time human face in WebGL, and the result unsettled enough commenters that they’re now proposing to name it “NoBlink” and adopt it as the local-inference community monster. “I can’t tell what’s real and what isn’t anymore,” wrote one. Source code requests are piling up.

Siri gets auto-deleting chats. Apple’s revamped Siri, expected in iOS 27, will offer chat retention options of 30 days, one year, or indefinitely, The Verge reports. Apple is leaning hard on privacy as its differentiator even as it quietly routes some queries through Google’s Gemini, betting that anxiety over AI data practices will eventually outweigh capability gaps.

That’s it for this morning. If you’re benchmarking anything on an M5, please include your prefill numbers.