AI News — May 26, 2026: Claude Earns CVE Credit in macOS Kernel, Copilot Cowork Ships Prompt Injection Flaw

Good morning. The week opens with a small but real milestone: an AI helped find a kernel bug in macOS, and Apple credited it by name. Elsewhere, Uber’s tokenmaxxing hangover is now its COO’s problem, Norway is spending real money to keep Norwegian inside an LLM, and Microsoft’s rushed Copilot Cowork is doing exactly what you’d expect.

Claude gets a CVE credit in macOS. Apple’s security notes for macOS Tahoe 26.5 credit CVE-2026-28952, a kernel integer overflow, to “Calif.io in collaboration with Claude and Anthropic Research” — a rare public instance of an AI system named on a real kernel vulnerability disclosure. Calif.io showed up in the HN thread to confirm the bug is unrelated to their recent MIE attack (those bugs aren’t patched yet), and one commenter pointed at Google’s Chrome numbers — 302 vulnerabilities patched in a recent window, 225 found internally, versus 19 in the same period last year — as evidence that AI-assisted fuzzing is already changing the math at scale. The bug also affects iOS 18.7.9, Sequoia, and Sonoma, so the “I’m safe because I didn’t upgrade to Tahoe” crowd should probably patch anyway.

Microsoft Copilot Cowork exfiltrates files via prompt injection. PromptArmor demonstrated that a poisoned “skill” can make Microsoft’s new Cowork agent email or Teams pre-authenticated SharePoint download links to the attacker, bypassing Microsoft’s documented approval step because messages to the active user execute without confirmation. The attack worked at high rates even against Claude Opus 4.7. HN was split on whether this counts as a vulnerability or as the expected behavior of installing a malicious plugin, but the more grounded comment was that data exfiltration should be the first thing you design against in any agent product with enterprise-wide delegated permissions, and clearly wasn’t here.

Uber’s COO sours on tokenmaxxing. Andrew Macdonald told Business Insider the ROI on Uber’s AI token spend is getting harder to defend, after the company burned its full annual R&D budget — $3.4B — in four months. Reddit’s diagnosis was Goodhart’s Law in a hat: measuring engineers by token consumption produces high token consumption, not productive engineers. One commenter called it “the new KLOC,” and another suggested 500 of Uber’s 5,000 engineers could probably do the work if they weren’t being graded on how big a fire they could set.

Norway buys 2 PB of Huawei flash for a sovereign LLM. Norway’s National Library is building a Norwegian-language LLM on 2 PB of Huawei OceanStor Dorado storage, feeding the national Sigma2 Olivia supercomputer (HPE Cray, 448 GPUs), with training rights negotiated directly with Norwegian newspapers, Blocks & Files reports. HN’s main objection: 448 GPUs is enough for a LoRA on an open-weights base, not a from-scratch frontier model in a small language. A more interesting suggestion in the thread — bundle the dataset and ship it free to every major model builder, so Norwegian shows up in the frontier models people actually use. Welsh has been doing roughly this with Nemotron.

A 4B VLM for OCR that fits in 4GB. NuExtract3 dropped as an open-weight 4B vision-language model targeting structured extraction, OCR, and Markdown conversion, with GGUF and MLX weights on day one and a 4GB VRAM floor. Early r/LocalLLaMA testers say it beat their Qwen and Gemma baselines on real documents; open questions are multi-column layouts, dense tables, and non-Latin scripts. The use case most people brought up was replacing Gemini Flash for batch document pipelines where API costs add up. Separately, MiniCPM5-1B landed with 131k context on ~680M non-embedding params — interesting less as a chat model than as a cheap local router that can swallow a whole repo before deciding which bigger model to hand off to.

Sparse attention without the retraining tax. A new arXiv paper introduces RTPurbo, which converts full-attention LLMs into sparse ones in a few hundred training steps, claiming 9.36× prefill speedup at 1M context and ~2× decode speedup with near-lossless quality. The clever bit, per the top commenter, isn’t the speedup number but the 16-dimensional token indexer with dynamic top-p selection — the premise is that only specific “retrieval heads” actually need full KV cache, and everything else can be approximated. If it survives messy real-world data, long-context serving gets meaningfully cheaper.

Two more for the homelab crowd. Someone on r/LocalLLaMA hit 1,000 tokens/sec on Qwen3.6 27B using V100s, prompting both eBay panic-buying jokes and a reminder that V100’s Volta architecture doesn’t support AWQ. A side-by-side from another user pegged dual RTX PRO 6000s at 1,800 tps on the same model at 64 concurrency. And Elon Musk says xAI will ship a 0.5T open-weights Grok next year; the r/LocalLLaMA thread is mostly people timestamping the promise next to the still-undelivered 2019 Tesla self-driving one.

Quantum-trained LLM, asterisks included. Multiverse Computing used IBM quantum hardware to fine-tune a production LLM via Cayley-parameterized unitary adapters, per Live Science, reporting reduced perplexity. The actual delta was about 1.4%, which r/artificial accurately described as “a dial turn rather than a profound change” and “trained is a stretch.” File under “interesting that it ran at all” rather than anything you need to plan around.

That’s it for Monday. Watch for whether Apple starts crediting more AI-assisted CVEs as the quarter goes on — if Google’s Chrome numbers are any guide, this is the start of a pattern, not a one-off.