AI News — April 16, 2026

Good morning. Today’s digest is dominated by Google shipping things — multiple things — while the edges of AI deployment keep getting pushed outward in ways that would have seemed implausible just a couple of years ago.

Google had a busy Tuesday. The company launched a native Gemini app for Mac (macOS 15+), featuring a floating chat bubble via Option + Space and screen context sharing — joining OpenAI and Anthropic who’ve had Mac apps for a while now. The same day, Google also dropped Gemini 3.1 Flash TTS, a text-to-speech model supporting 70+ languages with granular audio style controls, SynthID watermarking, and an Elo score of 1,211 on the Artificial Analysis leaderboard — rolling out across AI Studio, Vertex, and Google Vids.

On-device AI is having a moment. Google’s Gemma 4 is running fully offline on iPhones via the AI Edge Gallery app at around 16 tokens/sec on an iPhone 16 Pro — though it’s routing through the GPU rather than Apple’s Neural Engine, which hits the battery harder. There’s a catch for developers: Apple’s App Store rule 2.5.2 is apparently blocking third-party apps from shipping local LLMs, which is its own quiet drama. Meanwhile, in browser territory, a 1-bit quantized Bonsai 1.7B model weighing just 290MB is now running inference locally via WebGPU — technically impressive enough that commenters are invoking “two years ago we were debating if 7B could run on consumer GPUs at all,” though the honest consensus is that current Bonsai models hallucinate badly enough to limit practical use.

Agentic AI is getting more plumbing. OpenAI updated its Agents SDK with sandboxing capabilities that isolate agent access to specific files and tools, plus a testing harness for long-horizon tasks — a clear enterprise pitch as the agentic space gets more crowded. Adobe, for its part, launched the Firefly AI Assistant (formerly “Project Moonlight”), a cross-app agent that can orchestrate tasks across Photoshop, Premiere, Lightroom, and Illustrator from text prompts, with a public beta coming in weeks.

Two more things worth a quick look. OpenAI launched ChatGPT for Excel as an add-in, but early reports are rough — users are citing 15-20 minute waits for basic tasks and accuracy issues with financial modeling, which is a bad combination for a spreadsheet tool. Separately, Darkbloom is pitching a distributed inference network that pays idle Apple Silicon Mac owners to serve AI requests — but Hacker News commenters are poking holes in the math (realistic revenue closer to $67/month at full utilization) and raising a legitimate concern: Apple Silicon lacks the kind of trusted execution environment that would make the “private inference” claim verifiable.

Finally, for the curious: a researcher shared a beautiful visualization of decoder block activations evolving during LLM training, showing early and late layers converging while middle layers stay chaotic — complete with mysterious “pulse” events that correlate with major shifts. The comment section has dubbed it Schrödinger’s LLM, which feels about right.

It’s a day of breadth over depth — lots of things shipping, few of them finished. More tomorrow.