AI News — May 15, 2026: Codex Goes Mobile for All Plans, Microsoft Pulls Claude Code From Thousands

Good morning. Codex on your phone is the headline today, but the more telling story might be in Redmond — Microsoft is cutting off Claude Code for thousands of its own developers, which says something about how the coding-agent wars are actually being fought. Plus Ontario’s audit of AI medical scribes lands with specifics, and Prime Intellect let two agents grind on nanoGPT for two weeks to see what happens.

Codex moves to mobile. OpenAI added Codex to the ChatGPT mobile app on iOS and Android across all plans, including free. The pitch is less “write code on your phone” and more remote-control: kick off a long-running task at your desk, then approve commands and review outputs from wherever. HN’s reaction is split between people who already do this through SSH tunnels and find it genuinely useful, and people who wanted Codex Cloud or CLI integration instead of yet another feature bolted into ChatGPT. The Verge frames it as OpenAI accelerating against Claude Code, which brings us to the next item.

Microsoft cancels Claude Code internally. Microsoft is ending Claude Code access for thousands of developers in its Experiences + Devices division — the org behind Windows, Microsoft 365, Teams, and Surface — by June 30th, The Verge reports, pushing everyone onto GitHub Copilot CLI. The official line is consolidation on a single agentic CLI, but sources say it’s also a fiscal-year-end cost cut. The subtext is more interesting: Claude Code got popular enough inside Microsoft that it was undermining adoption of Microsoft’s own product, so out it goes.

Ontario’s AI scribe audit lands hard. The province’s auditor general found that of 20 AI medical transcription systems evaluated, 9 fabricated treatment suggestions, 12 inserted incorrect drug information, and 17 missed key mental health details, per The Register and CBC. One HN commenter shared a personal version: their AI-generated notes said they’d been diagnosed with osteoporosis and reported hip pain, none of which had been mentioned. The fair counterpoint, also from HN: the report doesn’t disclose error rates or sample sizes, and 60% drug-mixup rates may not be far off human baseline in medical records — though as a Reddit commenter noted, AI errors tend to fail in ways our existing systems aren’t built to catch.

Two agents, 14,000 GPU hours, one nanoGPT record. Prime Intellect turned Claude Opus and Codex loose on the nanoGPT speedrun for two weeks, running ~10,000 training jobs across ~14,000 H200 hours. Both beat the human baseline, with Opus setting a new record at 2,930 steps versus the human 2,990. The behavioral split is the interesting part: Opus repeatedly refused to keep running autonomously, while Codex never stopped but tended to grind the same hyperparameter region for hours. Both excelled at search and sweeps; neither generated genuinely novel ideas without human-submitted records as inspiration.

Cactus’s Needle, continued. Following up on yesterday’s coverage of Needle, the 26M-parameter tool-calling model: a community member built a fully client-side ONNX browser demo within hours, and others are floating it as a natural-language CLI argument parser. The lingering concern remains Google’s documented countermeasures against distillation, which can degrade student models in real time if detected.

More 1T models nobody can run. InclusionAI dropped Ring-2.6-1T, a trillion-parameter MoE, to a notably tired r/LocalLLaMA reception. Commenters point out it barely edges 27B models on benchmarks and loses to Kimi K2.6, and the top reply is essentially a plea: please make runnable models — under 40B dense or under 150B MoE — that fit on a 3090 with 64GB of RAM. Separately, an MTP-for-Qwen implementation in llama.cpp is getting attention for claimed ~90% acceptance rates on M5 Max, though the benchmark methodology compares TurboQuant-to-TurboQuant rather than isolating MTP’s contribution, and several commenters recommend DFlash instead.

Musk v. Altman, closing arguments. The three-week trial reached closing arguments yesterday, per The Verge’s live coverage. The reporting consensus is that Musk’s own witnesses and conduct hurt his case more than they helped it. Now we wait for the judge.

That’s the briefing. Microsoft killing Claude Code internally while OpenAI ships Codex to phones is a neat snapshot of where the agentic coding fight actually is right now — less about model quality, more about distribution and lock-in.