AI News — May 21, 2026: o3 Fills 125 Pages to Kill Erdős Conjecture, OpenAI Files S-1 at $850B

Good morning. The day’s defining thread is reasoning at scale: OpenAI’s o3 cranked out a 125-page chain of thought to disprove an Erdős conjecture, while the IPO machinery whirs to life behind it. Elsewhere, Cohere quietly returned from the dead with an open-weights MoE, and Anthropic borrowed Elon’s GPUs.

OpenAI’s o3 disproves an Erdős conjecture. OpenAI says its o3 model produced a counterexample to a longstanding conjecture in discrete geometry, drawing on algebraic number theory in a chain of thought spanning 125 pages. A postdoc on Hacker News called the tweaks “non-trivial” rather than pattern-matched, though others pointed out finding a counterexample is generally easier than proving a conjecture true, and skeptics flagged the usual training-data questions. One commenter wondered aloud why every AI-math breakthrough seems to involve an Erdős problem specifically — possibly because there are thousands of them, conveniently sized for model attention spans.

OpenAI files for IPO as soon as Friday. OpenAI is preparing a confidential S-1 with Goldman Sachs and Morgan Stanley at an $850B private valuation, CNBC reports, with SpaceX-xAI racing toward its own filing on a parallel track. The timing raised eyebrows on HN given recent reporting that the CFO had said the books weren’t ready for public scrutiny just two weeks ago. Several commenters read the dual filing as insiders trying to lock in capital before sentiment turns; one noted dryly that index-fund holders will own a slice either way.

Gemini 3.5 Flash gets a frosty second look. A day after Google’s I/O launch, independent benchmarks are landing and they’re not flattering. Nick Lothian’s agentic SQL benchmark has 3.5 Flash scoring 19/25 — slower, more expensive, and worse than 3.1 Flash Lite Preview, and outperformed across the board by Gemma4 26B-A4B. The “Listen to article” feature on Google’s own announcement post reportedly hallucinates a passage in Russian around the 4:15 mark, which is the kind of detail that writes itself.

Qwen3.7-Max claims the non-hallucination crown. Alibaba released Qwen3.7-Max, pitched as a frontier agentic model with a SOTA non-hallucination rate on AA-omniscience that reportedly beats Opus 4.7, Gemini 3.1 Pro, and GPT-5.5. As usual, the benchmark tables conveniently omit the latest competitor versions, and the open-weights question is unanswered — though Qwen has a track record of dropping HuggingFace releases a week behind proprietary ones. Separately, a leaked roadmap image suggests another 27B is incoming, with the LocalLLaMA crowd lobbying hard for a 35B MoE to fit on 16GB cards.

Cohere returns with Command A Plus. Cohere shipped Command A Plus, a 218B-parameter MoE with 25B active, multimodal text-and-image input, and — notably — an Apache 2.0 license, a real departure from their previous restrictive terms. Reception on r/LocalLLaMA is cautiously warm: longtime fans of the original Command R+ are glad to see Cohere back, though some grumbled that recent models feel filled with “scale.com slop” and GPT-OSS-style refusals. No reasoning variant yet, and benchmarks against MiniMax M2.7 or DeepSeek V4 are conspicuously absent.

Anthropic gets access to Colossus 2. Anthropic is expanding onto xAI’s Colossus 2 facility with GB200s, per a tweet making the rounds on HN. The read on HN is bearish for xAI: between Cursor training on Colossus 2 and Anthropic now occupying parts of both Colossus 1 and 2, it looks like Musk is repositioning xAI as an infra play rather than a frontier-model contender. Others noted Colossus’s ongoing reputation for unpermitted gas turbines, which hasn’t improved.

Stable Audio 3 ships small, fast, and open. Stability released Stable Audio 3, a family of latent diffusion models that generate variable-length audio with inpainting support, trained on licensed and CC data. One HN commenter clocked 120 seconds of audio in under 2 seconds on a 3090. Quality skews heavily toward electronica and falls short of Suno on vocals or acoustic textures, but the small models have open weights and there’s already a one-liner MLX install for Macs.

That’s the morning. The OpenAI S-1, if it lands Friday, will give us our first real look at the numbers everyone’s been speculating about for two years — assuming the redactions leave anything legible.