Good morning. A quieter news day after the back-to-back Shazeer and Jumper departures, but the undercurrents are interesting: Cloudflare is letting AI agents spin up infrastructure with no account at all, a pen-testing model that doesn’t refuse just shipped, and a plagiarism case is putting a name to the AI-slop-clones-your-life-work phenomenon.
A contrarian take on GLM-5.2’s hallucination numbers. A widely-shared post claims GPT-5.5 hallucinates 3x more than the MIT-licensed GLM-5.2 and uses that to argue scaling has plateaued. The HN thread is having none of it: the AA-Omniscience hallucination rate is conditional on the model not knowing the answer, so it captures abstention behavior rather than how often you’ll hit a hallucination in practice. As one commenter pointed out, GPT-5.5 xhigh actually scores highest on overall AA-Omniscience accuracy — it just doesn’t know how to say “I don’t know” as gracefully. Read the original argument here.
Cloudflare lets agents deploy without an account. Run wrangler deploy --temporary and your agent gets a live Worker for 60 minutes, claimable afterward by a human who wants to keep it. Cloudflare announced the feature as a way to eliminate signup friction in agent workflows, and Wrangler itself will now prompt agents about the flag when they hit an auth wall. The obvious question, raised repeatedly on HN: what stops this from becoming the easiest way ever to host ephemeral malware? Cloudflare says it rate-limits account creation, but the abuse surface is real. Developers also keep asking — still — for hard billing caps.
A pen-testing model that doesn’t refuse. Cosine released ArgusRed, a post-trained model that actively probes for vulnerabilities — dependency analysis, secret detection, SQLi/XSS, optional exploit verification in Docker. The HN reaction is split between “any abliterated open-weight model already does this” and a more interesting observation: the fact that Kimi K2.6 can be post-trained for offensive security this easily means refusal-based defenses at Fable, Anthropic and elsewhere are temporary by construction.
Plagiarism, but make it AI-generated. Andy Baio documents how a company called Qontour (Prompt Digital Inc., a Webflow premium partner) cloned John Koenig’s Dictionary of Obscure Sorrows nearly verbatim — full text of the book, lookalike domain, AI-generated art replacing Koenig’s originals, monetized via Amazon affiliate links. HN commenters are calling it a textbook DMCA case, and several shared their own stories of waking up to find AI-rebranded copies of work they’d given years to. One noted the irony fits Koenig’s own project: a new sorrow, watching a thief produce a slicker, more visible version of your life’s work.
The Atlantic publishes a music training-data database. The Verge covers The Atlantic’s searchable index of four music datasets totaling over 21 million tracks — from Lady Gaga to Wu-Tang — that have been used to train AI models including ones from Google and Stability. The datasets are technically “freely available” online, but actually downloading them generally requires scraping in violation of platform ToS, and sources like Free Music Archive permit personal streaming while requiring commercial licensing for training use. The licensing question stays unsettled, but the dataset is now traceable.
When working AI code still isn’t good code. A short post makes the case that the standard rejection criteria for human code — oversized diffs, premature abstraction, an author who can’t explain their own approach — apply equally to AI output. HN mostly agrees, with the sharpest comment pointing out that if you rephrased the title to “When I reject my coworker’s code,” there’d be no debate at all. The dissenting view: rejecting working code is too hands-on; you should iterate with the agent instead.
Reverse-engineering Qualcomm’s NPU compiler. A developer working on edge deployment used Ghidra and Claude Code to pull apart Qualcomm’s HTP compiler after finding the documentation badly insufficient. Among the discoveries: the compiler uses a Mixed Integer Linear Programming solver for VTCM memory placement, can silently downgrade weight precision under memory pressure, and ships with a hidden analytical simulator called Hextimate. If you’re deploying on Qualcomm hardware, this fills gaps that even basic VTCM specs don’t cover publicly.
Wired tries the new Siri. Apple’s overhauled Siri, built on third-gen Foundation Models developed with Google and shipping in iOS 27, finally has conversational memory, app integration, and personalization keyed to your messages, photos and emails. Wired’s hands-on says it’s meaningfully more capable than its predecessor, though still in developer beta. Whether “meaningfully more capable than old Siri” clears the bar users actually care about is a separate question.
Napkin math on inference costs. A walkthrough of GPU economics for serving a 32B dense model on a B200 lands at ~$133 per user in hardware cost, assuming 300 concurrent users. HN flagged that the 32B assumption is buried mid-post and consequential, that ~600W average power plus cooling and rent are missing, and that one of the algebraic steps appears to be off. Useful as a starting point, less useful as a final answer.
That’s it for today. Watch the Cloudflare experiment closely — whether agent-driven deploys turn into a malware vector or just convenient tooling will tell us a lot about how the next year of agent infrastructure plays out.