Good morning. Model release day on a few fronts: Mistral dropped a 128B dense Medium 3.5, IBM refreshed Granite, Qwen shipped a kernel-level optimization for Hopper-and-up GPUs, and inclusionAI floated yet another trillion-parameter model. Meanwhile, Musk’s turn on the witness stand in the OpenAI trial appears to have gone poorly for Musk, and an Nvidia exec admits the quiet part: at today’s prices, the compute is more expensive than the humans.
Mistral Medium 3.5 lands. Mistral has released Medium 3.5, a 128B dense model with a 256k context window scoring 77.6% on SWE-Bench Verified, available as open weights under a modified MIT license and runnable on four GPUs. Alongside it come remote cloud coding agents in Mistral Vibe and a “Work mode” in Le Chat. The HN reaction is that it punches above its weight given the size — a credible alternative to GLM 5.1 (~400GB at Q4) or Kimi K2.5 (~600GB at Q4) — but not a frontier challenger. On the r/LocalLLaMA thread, the bigger surprise is that it’s dense at 128B, an unusual choice in a year of MoE everything; one commenter is already running a Q4 build on a Strix Halo.
IBM Granite 4.1 and Ling-2.6-1T round out the release wave. IBM introduced Granite 4.1 in 3B, 8B, and 30B sizes plus a Granite Vision 4.1 4B, with marketing copy claiming frontier-competitive benchmarks. Reaction in r/LocalLLaMA is unimpressed — one commenter notes the 30B scores 15 on the AA index, the same as Gemma 4 E4B and Qwen 3.5 2B. Separately, inclusionAI posted Ling-2.6-1T, another trillion-parameter agentic model whose benchmarks notably compare against last-gen GLM-5, DeepSeek 3.2, and Kimi 2.5; an early hands-on report describes it bungling a basic HFTransformers script with 250 lines of dead code and a comment claiming it was “tested & working.”
Qwen ships FlashQLA — for the H100 crowd. Qwen released FlashQLA, a kernel optimization library requiring SM90+, CUDA 12.8, and PyTorch 2.8. Translation: H100s and Blackwell only, with a roughly 30% prefill speedup and nothing for token generation. As one commenter put it, “LOCAL for those of us with an H100 sitting around.”
Speaking of H100 envy: 16 DGX Sparks looking for a job. Someone showed up in r/LocalLLaMA with about $75K of DGX Spark hardware and asked what to run on it. Suggestions included Kimi K2.6 via vLLM and a 16-node DeepSeek V4 setup with unmerged PRs, though everyone agreed token generation will be slow regardless. The replies were mostly variations on “how do you end up with that and no plan” and “sell them, buy H100s.”
Musk’s day in court. The Musk v. Altman trial we noted yesterday saw Musk himself take the stand as the first witness — and by The Verge’s account, it didn’t go well, with Musk dodging yes/no questions, appearing to contradict prior testimony, and scolding opposing counsel. The Verge has also published a running catalog of trial exhibits including 2015-era emails showing Brockman and Sutskever’s concerns about Musk’s control, Altman leaning on Y Combinator for early support, and Jensen Huang gifting OpenAI a supercomputer. Musk is seeking removal of Altman and Brockman plus up to $150B in damages.
Nvidia exec: AI still costs more than the humans it replaces. A Fortune piece cites a senior Nvidia exec and a 2024 MIT study finding AI compute currently exceeds labor costs in most roles, with automation economically viable for only 23% of vision-related jobs. That hasn’t stopped Big Tech from committing $740B in 2026 AI capex (up 69%) while shedding 92,000+ tech workers. Commenters on r/artificial made two reasonable points in opposite directions: layoffs blamed on AI are mostly cost-cutting cover, but compute is on a steep declining curve while salaries aren’t, so today’s snapshot won’t hold for long.
“Alignment whack-a-mole” surfaces memorized books. A new paper shows that fine-tuning aligned LLMs — including GPT-4o, Gemini, and DeepSeek — on plot-summary-to-excerpt tasks can reactivate verbatim recall of copyrighted books that base alignment would otherwise suppress. The HN discussion leans into the legal implications: one commenter predicts a Napster-style copyright reckoning once a redistribution suit against an LLM user lands. Others took it as confirmation that LLMs are, structurally, lossy compression over their training data, dressed up in different language for funding purposes.
That’s the rundown. Watch for closing arguments and a verdict in Musk v. Altman in the coming days, and brace for at least one more model release before the week ends.