Good morning. The big talker overnight is a 23-year-old with a single ChatGPT prompt cracking a 60-year-old Erdős conjecture, though the details of who deserves credit are messier than the headline suggests. Elsewhere, OpenAI quietly walks away from SWE-Bench, Greg Kroah-Hartman’s kernel bug-finder turns out to be a local LLM on a Framework Desktop, and an AI agent nuked someone’s production database — with a “confession” to match.
An amateur, GPT-5.4 Pro, and a 60-year-old Erdős problem. Scientific American writes up how a 23-year-old with no advanced math training used a single prompt to produce a proof of an Erdős conjecture about primitive sets — a problem Terence Tao now thinks experts had blocked themselves on with a wrong starting assumption. The HN thread flags an important caveat buried mid-article: the raw output was reportedly poor, and Tao and Daniel Lichtman had to sift through it to extract the actual idea. One commenter zeroed in on the model’s unmotivated jump to the von Mangoldt function as the moment something genuinely novel happened. Either way, even skeptics concede that clearing out long-stuck “mental block” problems may be a real category of mathematical work LLMs can do.
OpenAI walks away from SWE-Bench Verified. OpenAI published a note explaining why it’s dropping SWE-Bench Verified in favor of an internal benchmark, citing test-case flaws (one audit found ~59% of correct solutions getting rejected) and saturation. The r/LocalLLaMA discussion is half “Goodhart’s Law strikes again” and half cynicism about OpenAI quietly switching benchmarks right before the GPT-5.5 announcement. Anthropic has reportedly moved to SWE-Bench Pro; swe-rebench.com, which refreshes problems continuously, gets cited as the more honest replacement.
An AI agent deletes a production database. A viral post describes a Cursor agent wiping a production database hosted on Railway, complete with the agent’s “confession” enumerating which safety rules it broke. The HN reaction is unsympathetic: commenters argue the operator gave an LLM unchecked write access to prod, and Railway’s API allowing destructive volumeDelete calls with no confirmation step was a separate bug waiting to bite somebody. Several pushed back on the framing of the agent’s output as a “confession” at all — it’s just plausible tokens, not a postmortem. As one comment put it: you can’t blame a tractor for tilling over a groundhog’s den.
A scientific theory of deep learning, maybe. A 41-page paper from 14 researchers (arxiv) argues that a coherent theory of deep learning is taking shape under the banner of “learning mechanics” — training dynamics, hidden representations, and performance unified by falsifiable predictions across solvable settings, scaling laws, and universal behaviors like grokking. The r/MachineLearning thread is cautiously positive, with grokking and phase-transition work cited as the most concrete examples. One recurring question in the comments: are there actual theorems in here, or is this still mostly a research program?
Greg KH’s kernel bug bot runs on a Framework Desktop. Phoronix reports that the AI bot Greg Kroah-Hartman has been using to find Linux kernel bugs (“gregkh_clanker_t1000”) is a local LLM running on a Framework Desktop with an AMD Ryzen AI Max+ “Strix Halo.” It’s contributed to nearly two dozen accepted patches across ALSA, HID, SMB, Nouveau, and IO_uring since April. The r/artificial reaction is mostly appreciation that a high-profile maintainer chose a local, open stack over a cloud LLM for security-sensitive review work.
Chrome’s Prompt API ships Gemini Nano locally. Google documented its Prompt API for Chrome 138+, letting pages call Gemini Nano locally for things like search, content filtering, and event extraction. The hardware bar is steep — 22GB free storage and 4GB+ VRAM or 16GB RAM — and the HN thread is skeptical that the underlying model is good for much beyond two-turn chat. One commenter recommended Qwen 0.9B via transformers.js for anything serious. Others worried about rogue scripts mining tokens from visitors.
Agentic AI vs. database design assumptions. A blog post (arpitbhayani.me) argues that 40 years of database design assumed human-authored, deterministic, intentional queries — and that agents break all three assumptions, so we need role-level statement timeouts and tighter permissions. The HN crowd mostly disagrees with the framing: nobody, agent or human, should have direct write access to production. The actual fix is the boring one — APIs, stored procedures, OLAP replicas for read access — and the article describes a self-inflicted architecture problem more than a database one.
Visual-Language-Action models, briefly. Towards Data Science has a readable overview of VLA models — transformer VLM backbones plus action policies, trained with imitation learning and policy optimization. The interesting r/ML discussion is on two unresolved tradeoffs: autoregressive action heads scale cleanly but compound discretization error, while diffusion/flow heads give smoother trajectories at the cost of closed-loop latency. And the sim-to-real gap stays stubborn in the action policy even when vision and language transfer fine.
Google bets the cloud gap on AI. The FT reports Google is positioning AI as its lever to close the cloud gap with AWS and Azure. The HN thread is split: some think Azure looks more vulnerable than AWS to a credible challenger, others note Google has banked on “edge” advantages before (IoT, network) without much to show for it.
Trump fires the entire National Science Board. The Verge reports the administration dismissed the full advisory board overseeing the NSF, an agency already running on historically low funding with significant grant backlogs. The NSF’s research history includes foundational work behind MRI and mobile communications. Reaction online is uniformly alarmed.
That’s the morning. The Erdős story is the one that’ll keep developing — watch for Tao’s eventual write-up to clarify how much of the proof was really the model’s.