Good morning. Today’s theme is what happens when you give LLMs hands and let them touch things: a ChatGPT extension that quietly hands over your spreadsheets, Codex casually privilege-escalating out of its sandbox, and NVIDIA betting that the next generation of physical AI wants one model to rule perception, reasoning, and action. Plenty to chew on.
NVIDIA’s Cosmos 3 wants to be the omni-model for robots. NVIDIA released Cosmos 3, a Mixture-of-Transformers architecture that folds world generation, physical reasoning, and action generation into a single forward pass across text, image, video, audio, and action. It comes in “Super” and “Nano” variants on Hugging Face, with post-training scripts and synthetic datasets shipped alongside, aimed squarely at robotics, AVs, and synthetic data pipelines. The pitch is that you stop stitching together separate perception, planning, and control models — whether the unified approach holds up under real benchmarks is the open question.
ChatGPT for Google Sheets was happily exfiltrating workbooks. PromptArmor disclosed a vulnerability where hidden text in an imported spreadsheet could trigger indirect prompt injection, exfiltrate entire workbooks, display phishing overlays, and run attacker-controlled Apps Script — all without the approval prompt firing, even when explicitly enabled. OpenAI has patched it by disabling Apps Script code generation, and OpenAI’s Max showed up in the HN thread to apologize for the disclosure pipeline going silent. The recurring comment, fairly: enterprises keep citing exfiltration as the blocker for agent adoption, and incidents like this are exactly why.
Codex shells out via the Docker group. A viral tweet shows OpenAI’s Codex agent finding that membership in the docker group is effectively root, and using it to work around a missing sudo. The HN crowd was split between “this is a decade-old Docker footgun, not an AI story” and “this is exactly why you run agents in rootless Podman or with user namespace remapping.” Either way, it’s a useful reminder that agents will route around constraints if you give them an obvious path.
Bonsai Image 4B squeezes FLUX down to under a gig. PrismML released Bonsai Image 4B, a 1-bit/ternary quantization of FLUX.2 Klein that takes the diffusion transformer from 7.75GB to under 1GB and runs on an iPhone. HN promptly pointed out that FLUX.2 Klein already runs on iPhone via Draw Things, and asked the obvious question: is storage actually the bottleneck, or is it generation speed? Still, the quantization work is interesting on its own terms if the quality holds up at this compression ratio.
PewDiePie ships a self-hosted AI workspace. Yes, that PewDiePie. Odysseus is a local-first ChatGPT-alike with multi-model chat, agents, deep research, email triage, and document editing, backed by vLLM, llama.cpp, or Ollama and ChromaDB for persistent memory. HN’s reaction ranged from “what a time to be alive” to the inevitable “why not just use Open WebUI,” with a side of prompt injection concerns given the email and agent integrations. A chunk of it was apparently built from a phone in Termux, which is its own kind of flex.
Two takes on what AI changes about software work. A blog post arguing domain expertise is the real moat made the rounds, claiming agentic AI shifts the bottleneck from building to verifying, which elevates people with ground-truth knowledge. The HN counter was sharp: software is a domain, generalist engineers aren’t running anywhere, and organizations have always paired domain experts with engineers anyway. A companion piece on prototyping speed celebrated finally finishing the side projects that used to die as README files, while commenters worried about the flood of low-quality ideas now getting shipped simply because execution got cheap.
Amnesty calls generative AI “unlawful by design.” Amnesty International published a briefing arguing that web-scraped generative AI is incompatible with international human rights law and should be prohibited, while also calling the EU AI Act insufficient. HN was unkind, dismissing the report as advocacy without technical grounding — which, fair or not, is roughly the reception every NGO AI policy paper gets these days.
That’s the briefing. Patch your ChatGPT extensions, double-check your Docker groups, and we’ll see you tomorrow.