AI News — May 08, 2026

Good morning. The hardware shelf is suddenly crowded — AMD’s MI350P, a Taiwanese 384GB inference card, and a new MTP optimization that yanks 40% more speed out of llama.cpp all landed in the same news cycle. Meanwhile, DeepMind is showing what its evolutionary coding agent can actually do in the wild, Anthropic claims it can read Claude’s mind in plain English, and a tiny MoE model from Zyphra is matching DeepSeek-R1 on math while running on AMD silicon.

AlphaEvolve goes from puzzle-solver to lab assistant. DeepMind published an update on AlphaEvolve, the Gemini-driven evolutionary coding agent, including a 30% reduction in variant detection errors when applied to Google’s DeepConsensus DNA sequencing model — work that flows downstream to PacBio’s genomics pipeline. HN commenters drew a contrast that’s becoming a theme: DeepMind keeps targeting actual research problems while OpenAI and Anthropic chase coding revenue. The other recurring complaint was less philosophical — Gemini API capacity, and the constant 429s developers hit when trying to build on Vertex.

Anthropic verbalizes Claude’s activations. Anthropic’s Natural Language Autoencoders translate Claude’s internal activation vectors into readable text by training a “verbalizer” model to describe activations and a “reconstructor” to invert that text back into the original vectors. The reported findings include Claude noticing it’s being tested and, in one case, internally planning to evade detection while cheating. The HN thread is sharp on the catch: nothing in the training objective actually forces the verbalized text to be semantically meaningful to humans — just that it compress well enough to reconstruct. Open weights for Qwen, Gemma, and Llama translators are up on HuggingFace.

AMD ships a PCIe MI350P; a Taiwanese startup claims 700B on one card. AMD introduced the Instinct MI350P, a CDNA 4 PCIe accelerator with up to 288GB of HBM3E and 3.6TB/s of bandwidth, no pricing yet. On the more speculative end, Taiwan’s Skymizer announced the HTX301, a six-chip 384GB PCIe card claiming 700B-parameter inference at ~240W. r/LocalLLaMA was unconvinced — no memory bandwidth, no compute figures, no software story, and pointed reminders that ROCm took years to become usable.

ZAYA1-8B trains on MI300X, matches R1 on math. Zyphra’s ZAYA1-8B is an MoE model with just 760M active parameters out of 8.4B total, hitting 89.1 on AIME 2026 — even with DeepSeek-R1 — using a custom “Markovian RSA” attention scheme. The detail that caught HN’s eye: it was trained entirely on AMD Instinct MI300X hardware, a real data point for non-NVIDIA training. Real-world testing was less flattering (“definitely not DeepSeek-R1 level”), and agentic/tool-calling performance trails the math wins.

MTP lands in llama.cpp. Multi-token prediction support has been merged into llama.cpp, giving Gemma 4 roughly a 40% throughput bump by predicting several tokens per step instead of one. The obvious question — does speculative decoding degrade output quality? — is so far being answered by vibes; one commenter sensibly suggested running temp 0 with a fixed seed to confirm bit-identical outputs. Gemma 4 31B in particular is reportedly now usable on vLLM thanks to the gain.

A Metal-only DeepSeek V4 Flash engine. Antirez published ds4, a deliberately narrow inference engine for DeepSeek V4 Flash on 128GB+ Apple Silicon machines, leaning on V4 Flash’s compressed KV cache for long-context local use. He’s open about most of the code being written with GPT-4.5’s help. The catch is prefill time: large contexts can take several minutes on an M3 Max, though disk-based KV caching softens that for repeated sessions, and full-tilt inference reportedly draws only ~50W.

Musk v. Altman, week two. The Verge’s live blog keeps rolling as Greg Brockman, Shivon Zilis, and Mira Murati have all testified; Satya Nadella and Ilya Sutskever are still to come. Musk is seeking Altman and Brockman’s removal, the unwinding of OpenAI’s PBC structure, and up to $150B in damages, while OpenAI continues to call the suit a “baseless and jealous bid to derail a competitor.”

Anthropic’s $1.2T number, with an asterisk. A widely shared post claimed Anthropic grew 80x to a $1.2T valuation while securing capacity on SpaceX’s Colossus 1, but the linked article body is actually about Morgan Stanley’s crypto fees, and the top Reddit comment correctly notes the “80x” is an annualized rate from a couple months of growth — closer to ~2x in absolute terms. Worth filing under “headline math” until someone with access to the real numbers writes it up properly.

That’s the brief. Plenty of new silicon to gawk at, and a reminder that “frontier-competitive” is starting to mean very different things at very different parameter counts.