Good morning. The week’s big thread keeps tightening around Chinese open-weight models: today it’s GLM-5.2 showing up in two separate cybersecurity benchmarks, including one where it edges out Claude Code. Meanwhile Ford has a cautionary tale about firing your experienced humans too soon, and Brown University is dealing with what might be the Ivy League’s largest AI cheating scandal on record.
GLM-5.2 keeps showing up in security benchmarks. Semgrep’s researchers found GLM-5.2 scoring 39% F1 on their IDOR vulnerability benchmark versus Claude Code’s 32%, at roughly $0.17 per vulnerability found — though both lost to Semgrep’s own multimodal pipeline at 53-61%. The Verge picked up the broader thread, reporting that Zhipu AI claims parity with Mythos on cyber tasks, which is now a US national security talking point given the model’s open weights. The HN discussion flagged a fair caveat — Claude Code is an agent harness, not a model — and one commenter predicted Commerce will pressure HuggingFace and OpenRouter to delist Chinese open weights within months. Several developers said they’ve already switched daily coding work to GLM-5.2 after the Fable/GPT-5.6 weekend mess.
Ford rehires the gray beards. After AI quality control systems missed the mark, Ford brought back 350 veteran engineers and credits the move with hundreds of millions in reduced warranty costs. COO Kumar Galhotra admitted the company had been “relying more and more on automated quality systems” with disappointing results, and VP Charles Poon conceded they’d wrongly assumed AI could produce quality products just by ingesting design specs. The rehired engineers are now training younger staff and reprogramming the AI tools that replaced them — Ford topped JD Power’s Initial Quality Survey among mainstream brands this year.
Brown’s AI cheating scandal goes public. Economics professor Roberto Serrano has identified at least 50 students who used AI on a mathematical economics midterm, and says administrators went silent until formal proceedings started. Commenters on HN were unsympathetic to the framing — one noted the irony of a game theorist not anticipating that when everyone else is cheating, the rational move is to cheat too. The practical consensus: take-home exams are finished, and in-person handwritten assessments are the future, with one professor at Dartmouth describing intro CS curriculum design as “an adversarial problem.”
Central bankers warn the AI boom looks bubbly. Per the Telegraph, the $2 trillion-plus pouring into AI companies has reached valuations that could trigger a broader crash regardless of whether AI delivers. The HN comments were resigned rather than surprised, with one commenter laying out the trap neatly: if AI fails, crash; if it succeeds and displaces workers, crash. Several noted boomer retirement liquidation is creating ideal conditions for whatever comes next.
ChatGPT logs hit the courtroom and a juror pushed back. In the Palisades fire arson trial, prosecutors entered Jonathan Rinderknecht’s ChatGPT conversations as evidence — including his requests for fire imagery and questions about fire liability. The jury deadlocked 10-2 for the defense, ending in mistrial, and one juror said publicly she uses ChatGPT herself and found it offensive to treat AI chats as evidence of intent. Worth watching as a precedent for how juries treat this kind of digital exhaust.
Codex still has no way to ignore your .env file. A GitHub issue requesting a .codexignore mechanism for secrets is open, with a similar issue closed years ago in favor of a Rust rewrite that never shipped. The HN response was pragmatic-bordering-on-dismissive: any blocklist gives false security because the model can always invoke shell commands to read the files anyway, and proper isolation means file permissions, containers, or opt-in access patterns. The consensus answer is that secrets shouldn’t be sitting in plaintext on disks that an agent can reach in the first place.
One developer asked Claude Code to read their MRI. A patient used Claude Code Opus 4.8 to analyze their shoulder MRI DICOM files after feeling rushed by their orthopedist, and the model flagged that the prescribed shockwave therapy went against current clinical guidelines for tendinopathy without calcification. Two radiologists turned up in the HN comments to caution that AI models are still “absolutely terrible” at medical imaging due to thin public training data, though others shared cases where AI surfaced things time-pressed doctors missed on chronic conditions. The author’s own framing was the honest part: AI breaks the comforting trust we place in experts before it’s earned the trust itself.
Two takes on token routing. A blog post arguing tokenmaxxing isn’t dead — claiming Meta’s much-mocked policy was a deliberate adoption-forcing strategy and that “compounding correctness” now makes heavy token use worthwhile — got a chilly HN reception, with commenters noting they’ve heard the compounding-correctness promise for over a year without seeing it. Separately, Wayfinder Router takes the opposite tack: a deterministic offline scorer that routes easy prompts to cheap models and hard ones to capable ones, no model call required. The main objection is context continuity — routing different turns of one conversation to different models breaks history and invites hallucination at the handoff.
That’s it for today. The Ford story is the one I keep thinking about — a useful counterweight to the week’s other thread about AI displacing whole categories of work.