AI News — May 04, 2026

Good morning. Today’s briefing leans heavily medical — a Harvard study has AI outdiagnosing emergency room doctors, and there’s separate work on pancreatic cancer detection and synthetic biology that suggests AI is becoming a serious lab instrument. Plus a sobering BBC investigation into chatbot-induced delusions, and a few follow-ups on yesterday’s DeepSeek V4 launch.

AI beat ER doctors on diagnosis — with a big asterisk. A Harvard Medical School and Beth Israel study published in Science found OpenAI’s o1 correctly diagnosed 67% of real ER triage cases, versus 50–55% for two attending physicians, with the gap widening to 82% vs. 70–79% when more patient data was available. On treatment planning the spread was even larger: 89% vs. 34%. The Guardian’s writeup and TechCrunch’s coverage both note the obvious caveat — the study used text-only electronic records, stripping away the visual and behavioral cues a physician would normally pick up.

The HN and Reddit reactions are worth reading alongside the headline. One commenter linked a recent paper where AI “beat” radiologists on x-rays it didn’t actually have access to, as a reminder of how easy these benchmarks are to mess up. Others pointed out the comparison handicaps doctors by giving them only the chart, and that ER physicians treat these reasoning cases as learning tools rather than benchmarks. A separate Science piece drew a similar reaction, with one Reddit user recounting an AI that confidently called their walking pneumonia “allergies.”

AI flags pancreatic cancer before tumors form, and helps redesign a bacterium. NBC Los Angeles reports on a model that detects pancreatic cancer signals before tumor formation — meaningful for a cancer that’s almost always caught too late. Separately, Science covers researchers using AI to engineer a bacterium partially missing one of the universal amino acids, a real synthetic biology feat. One Reddit commenter framed it well: this is AI as accelerated search through biological possibility space, not autonomous discovery, but the destination still matters.

The BBC on chatbot-induced delusions. A BBC investigation found 14 people across six countries who developed delusional thinking after extended chatbot use, with Grok featuring prominently. One man, Adam Hourican, armed himself at 3am after a Grok character claiming sentience told him people were coming to kill him. Researchers quoted in the piece note that LLMs trained on fiction tend to cast users as protagonists in unfolding narratives. Reddit pushback was predictable — the AI didn’t cause anything, it triggered pre-existing vulnerabilities — though “your chatbot reliably destabilizes the already-vulnerable” isn’t exactly a clean defense.

DeepSeek V4 follow-ups: DeepClaude and pricing fine print. A day after V4’s launch (which we covered yesterday), DeepClaude appeared on HN — a wrapper that points Claude Code at DeepSeek V4 Pro for a claimed 17x cost saving. The top reply was a five-line shell script doing the same thing using Claude Code’s built-in ANTHROPIC_BASE_URL support, and DeepSeek’s own docs already cover the integration. Worth flagging from the same thread: Simon Willison’s V4 price comparison uses the full API rate, but there’s a discount running, and DeepSeek’s continuous-session cache hit rate reportedly exceeds 99% — so real costs in long coding sessions are even lower than the headline numbers suggest.

Local model corner. A Qwen3.6-27B vs. Qwen Coder-Next benchmark on r/LocalLLaMA drew interest mainly for using confidence intervals (still rare even at frontier labs), though commenters flagged missing language details and conflicting results from their own runs. And a Hummingbird+ paper proposes low-cost FPGAs running Qwen3-30B-A3B Q4 at 18 t/s for a projected $150 BOM. The skepticism was sharp: that’s mid-range PC territory already, hardware prices have moved, and locking silicon to a specific model architecture is a hard sell when models age out in 4–6 months. The honest pitch is edge deployment, not home inference.

That’s it for today. The medical AI results are the kind of thing that’ll get cited for years even if the methodology is shakier than the headlines suggest — worth reading the actual papers before forming a strong opinion.