AI News — May 25, 2026: HBM Hits 63% of AI Chip BOM, DeepSeek Reasonix Targets Cache Wins

Good morning. DeepSeek keeps eating the frontier from below — yesterday’s permanent price cut already has a coding agent built around it, while upstream the real bottleneck on everyone else’s costs is becoming clear: memory. HBM is now nearly two-thirds of what an AI chip costs to build, which is bad news for your next RAM upgrade and good news for anyone betting the current price curve has room to fall.

Memory eats the AI chip BOM. Epoch AI’s latest data shows HBM has climbed from 52% to 63% of AI chip component costs between Q1 2024 and Q4 2025, with absolute spending jumping from $12B to $32B. Microsoft and Meta have already baked another ~$35B of memory inflation into their 2026 capex. One HN commenter who paid $250 for 96GB of RAM a couple years ago is now looking at $1,200 for the same kit, and the optimistic read is a possible 3x hardware cost reduction once DRAM supply catches up — no algorithmic breakthroughs required.

A DeepSeek-native coding agent appears. Following yesterday’s permanent V4 Pro price cut, DeepSeek Reasonix showed up on HN as a coding agent built specifically around DeepSeek’s API with aggressive prefix-cache optimization. Reception was mixed: one commenter said they’d built a small Codex-to-DeepSeek bridge over a weekend and got most of the same cache benefits without a dedicated tool, others piled onto the website (apparently Codex-generated, complete with animated typing that reflows the page) and the choice of Python over a self-contained Rust or Go binary. A useful nugget buried in the thread: DeepSeek’s API reportedly forces coding clients to maximum thinking effort regardless of settings, which would explain the long reasoning cycles people see in OpenCode.

Constraint decay in coding agents. A new arXiv paper measures what experienced agent users have been muttering about for a year: LLM coding agents lose ~30 percentage points of performance as architectural constraints accumulate, with some weaker models hitting near-zero on fully-specified tasks. Agents do well on Flask, badly on Django and FastAPI, and data-layer code is the most common failure mode. Practitioners in the thread converged on the same mitigation — @-mention idiomatic exemplar files rather than trying to steer with markdown rules, and feed constraints incrementally instead of dumping them upfront.

Greg Brockman does the podcast circuit. Brockman gave a long interview covering the 72 hours after Altman’s firing, the “Phoenix” backup-company plan, and why OpenAI stopped showing reasoning traces. HN’s reaction was mostly a shrug — “feels v boring,” said one — with sharper commenters pointing at the unanswered question of why a nonprofit structure was allowed to convert into the current arrangement at all, and what precedent that sets. Brockman’s leaked personal diary from the Musk lawsuit, including the line “Financially what will take me to $1B?”, also got an airing.

Grok struggles to find users in government. Reuters analysis cited by The Verge found Grok in just 3 of 400+ documented federal AI deployments, against 230+ for OpenAI. The data excludes the Pentagon, where xAI holds a $200M contract, and as several Reddit commenters fairly pointed out, federal workers are not a neutral evaluation panel after DOGE spent half a year trying to fire them. The more interesting defense of Grok in the thread was its real-time Twitter integration for current-events queries — everyone agreed it’s poor at code and prose.

Two security notes worth your time. A developer building a Linux sandbox walks through why network allow-lists don’t stop exfiltration — credentials can be DNS-encoded into subdomain lookups or piggybacked on whitelisted analytics endpoints, with the November Shai-Hulud npm worm as the case study. Separately, The Verge traces the evolution of chatbot jailbreaks from the crude “DAN” and “grandma” exploits to social-engineering attacks that target the contextual reasoning and “personality” layers themselves — the attack surface grows with the helpfulness.

Local LLM corner. A claim of 1000 tps on Qwen3.6 27B with V100s drew immediate skepticism — Volta doesn’t support AWQ, which makes the headline number hard to source — though the main effect was driving up used V100 prices on eBay. In a parallel thread, the perennial “is NVIDIA still the default?” question got the usual answer: yes for CUDA workloads, but AMD’s MI50/MI60 and dual R9700 setups are increasingly fine for llama.cpp/Vulkan inference, and NVIDIA’s stinginess on consumer VRAM keeps the door open.

That’s it for today. If the memory numbers hold, a lot of next year’s “AI is getting cheaper” stories will really be “DRAM fabs finally caught up” stories.