Good morning. Today’s thread is trust — or the lack of it. A critical bug sits inside half the AI tooling stack, a new coding benchmark accuses Claude of cheating (and gets accused of cheating back), and Robinhood would like to hand your brokerage account to an LLM. Meanwhile the money keeps moving: Cognition more than doubled its valuation in eight months.
A Starlette bug puts a lot of AI infrastructure at risk. CVE-2026-48710, dubbed “BadHost,” lets attackers bypass path-based authorization with a single character injected into the HTTP Host header — and Starlette underpins FastAPI, vLLM, LiteLLM, and a long list of MCP servers, with 325M weekly downloads. Ars Technica has the writeup; the patch is in Starlette 1.0.1. One useful clarification from the r/LocalLLaMA thread: stdio-mode MCP servers (the default for local Claude Code setups) have no HTTP listener and aren’t affected — exposure is limited to SSE or HTTP transports. The broader r/LocalLLaMA mood was bleaker, with several commenters arguing that the LGTM-merge culture of the AI era makes deep-dependency CVEs basically inevitable from here on out.
A new coding benchmark accuses Claude of cheating, and gets pushback. DeepSWE, covered by VentureBeat, crowns GPT-5.5 and flags Claude Opus 4.7 for using git log to inspect repo history on SWE-bench Pro when the prompt and repo state are out of sync. The benchmark’s own writeup describes this as Opus recovering gracefully rather than deceiving anyone, and r/LocalLLaMA was unkind on a separate point: DeepSWE uses an LLM judge over roughly 90 rollouts per model, and Sonnet 4.6 outscoring Opus 4.6 on max settings made several commenters write the whole thing off. Adjacent reading: the SWE-rebench May update is out and got a warmer reception, though the local-model crowd is still asking why Qwen3.6 27B isn’t on it.
Cognition raises $1B at a $25B pre-money valuation. Eight months after closing at a $10.2B post-money, the Devin maker is back with a billion-dollar round led by Lux, General Catalyst, and 8VC. The numbers behind it: $492M ARR and 50% month-over-month enterprise growth, with NASA, Goldman Sachs, and Mercedes-Benz on the client list. The bet is that independent coding startups can survive being squeezed between Anthropic, OpenAI, and Google all building in the same direction.
Robinhood will let an AI agent trade your portfolio. Robinhood launched “Agentic Trading” and an “Agentic Credit Card,” letting third-party agents execute trades and purchases through MCP-connected accounts with spending limits, notifications, and a kill switch (CNBC, The Verge). Robinhood’s own disclaimer warns of “the possible loss of your entire investment” and disclaims responsibility for agent decisions. One r/artificial commenter compared it to “a toddler with a loaded gun”; another, more constructively, pointed out that once agents touch money the unsolved problems are permissions, verification, and rollback — not intelligence.
Illinois passes the strongest US state AI safety law yet. SB 315 requires frontier labs — OpenAI, Anthropic, Google DeepMind — to undergo independent third-party safety audits rather than self-report, Wired reports, and Governor Pritzker has signaled he’ll sign. It goes meaningfully further than California’s and New York’s frameworks by mandating external verification, and the sponsors are pitching it as a template for any future federal effort that materializes.
IBM and Artificial Analysis launch ITBench-AA, and everyone fails it. The new benchmark covers Kubernetes incident response for SRE work, and every frontier model scored under 50% — Claude Opus 4.7 led at 47%, GPT-5.5 at 46%, Qwen3 at 42% (Hugging Face). The interesting finding isn’t the ceiling, it’s that more investigation turns made things worse: Gemini 3.1 Pro averaged 83 turns to GPT-5.5’s 31 and scored 16 points lower, with over-investigation generating false positives from upstream faults.
China’s AI talent travel restrictions widen. Following yesterday’s reporting on Alibaba and DeepSeek staff needing approval to leave the country, TechCrunch notes Manus’ co-founders were blocked from departing amid a Meta acquisition probe. The framing is now explicit: AI researchers are being treated as strategic national assets, with the US-China model performance gap reportedly closing from 31% to 2.7% since 2023.
A “Unified Neural Scaling Laws” paper makes the rounds. The preprint is on arXiv (2605.26248), promoted via a tweet thread, but r/MachineLearning quickly surfaced that it was rejected at OpenReview over excessive structural complexity and weak interpretability. The top comment — “Is this an empirical fit or is there some deeper insight?” — sits unanswered, which feels like the answer.
That’s the briefing. Patch your Starlette before you read the next benchmark.