🎯 Top 3 Things to Know
1. Google made Gemini 3.5 Flash the default Gemini model and shipped a personal agent at I/O. The headline from Tuesday's keynote was less the flagship Pro and more what Google did with its low-cost tier. Gemini 3.5 Flash is now globally available through the Gemini app and AI Mode in Search, and Google claims it beats the prior Gemini 3.1 Pro on coding and agentic benchmarks at roughly four times the speed. Alongside it: Gemini Spark, a proactive agent in the Gemini app that can run background workflows for AI Ultra subscribers, and Gemini Omni, a video model that takes text, images, and clips as input. The friction this addresses is that Google has been quiet on frontier shipping while Anthropic and OpenAI captured the agent narrative. Flash-as-default is a pricing move against GPT-5.5 Instant and Claude Haiku 4.5 on the cost-per-quality frontier. Worth checking whether independent reproductions back up the 4x-speed claim on real coding workloads and how Spark behaves outside the Google app surface. MarkTechPost
2. OpenAI is shipping Codex into customers' own data centers via Dell. The OpenAI-Dell partnership announced Monday and detailed further on Tuesday lets Codex run on the Dell AI Data Platform and AI Factory, so the agent can sit next to enterprise codebases, documentation, and systems of record without that data leaving the customer's network. OpenAI says Codex now has more than 4 million weekly developer users and is moving past code generation into agentic workflows like context-gathering, reporting, and lead routing. The friction is that regulated buyers in finance, healthcare, and government have been blocked from public-cloud AI by data-residency and audit constraints. This is OpenAI's first explicit hybrid and on-prem distribution play. Worth watching whether the deal pulls procurement budgets toward Dell infrastructure inside accounts where Microsoft Azure was the default OpenAI path. OpenAI blog
3. Anthropic added MCP tunnels and self-hosted sandboxes to Claude Managed Agents. The Tuesday update lets enterprise customers route Claude agents to MCP servers sitting inside their private network without exposing those servers to the public internet, and run agent code execution inside customer-managed sandboxes instead of Anthropic-hosted ones. The friction has been familiar to anyone deploying agents past a proof of concept: security teams refuse to open inbound holes for tool access, and legal teams refuse to send sensitive payloads to a vendor execution environment. Both features close that loop without changing the agent surface. Worth checking whether the tunneling protocol becomes part of the public MCP spec or stays Anthropic-specific, and whether self-hosted sandboxes meaningfully change the cost model for high-volume agent workloads. 9to5Mac
🚀 Frontier Models & Features
- Gemini 3.5 Pro is in internal testing, with Google flagging a public rollout next month. The model is expected to land against Claude Mythos Preview and a rumored GPT-5.5 mid-cycle update. Business Standard
- Claude Code's Fast mode now defaults to Opus 4.7 (previously Opus 4.6), with background sessions surfaced through
/resumeand projected context-cost estimates added to the plugin marketplace. Claude Code changelog
🔬 Research Worth Reading
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling (Zheng, Liu, Huang et al. / UMD, Microsoft, UNC). arXiv
- TL;DR: Treats test-time scaling as a controller-synthesis problem and lets an LLM agent search over branch, continue, probe, prune, and stop decisions on pre-collected trajectories, instead of hand-tuning width-depth heuristics.
- Stat: The full discovery loop costs $39.90 and 160 minutes of wall clock, and the resulting controllers improve the accuracy-cost tradeoff on math benchmarks while generalizing to held-out tasks and model scales.
- Apply it: Stop hand-coding self-consistency budgets. Collect a few hundred reasoning trajectories with intermediate probe scores, then let a controller-discovery loop on the cheaper model in the family pick the branching policy that minimizes tokens at target accuracy.
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs (Bian, Yu, Venkataraman, Park / U. Wisconsin & Amazon, ICLR 2026). arXiv
- TL;DR: Extends Chinchilla scaling laws with architectural variables (hidden size, MLP-to-attention ratio, grouped-query attention) so the same compute budget can be allocated for inference cost, not just training loss.
- Stat: The conditional scaling law identifies architectures that hold accuracy while shifting the inference-cost frontier, with searchable architecture-data joint optima rather than the fixed shapes implied by classic scaling laws.
- Apply it: Next time a model card lists FLOPs-matched comparisons, ask whether attention width and GQA group count were held fixed. If they were, the inference-cost story is incomplete and the published numbers undersell or oversell the deployment math.
🏢 Enterprise in the Wild
- Codex now reports more than 4 million weekly developer users, per OpenAI's Dell announcement, with growing share of activity coming from agentic flows rather than single-turn code completion. OpenAI
- Google launched Health Coach globally on May 19 as part of the I/O rollout, an AI-powered fitness and wellness layer inside the Google ecosystem. The feature is the largest consumer-health AI surface Google has shipped outside of Fitbit. CNBC
🛠️ Tooling & Ecosystem
- Claude Managed Agents MCP tunnels keep tool servers inside a customer's network while still letting hosted agents call them. Anthropic / 9to5Mac
- Microsoft Semantic Kernel now has first-class MCP support, joining Anthropic, OpenAI, Google, and Microsoft as native MCP implementers. The ecosystem has crossed 9,400 public servers and 97 million monthly SDK downloads. Microsoft DevBlogs
⚖️ Policy & Regulation
- No new federal or EU action moved overnight. The May 7 EU Digital Omnibus on AI provisional agreement (covered in yesterday's briefing) remains the open file, with formal adoption expected before August 2, 2026.
📌 Watch List
- Frontier price-performance: Flash-default pricing puts new pressure on Haiku and GPT-5.5 Instant tiers.
- On-prem and hybrid AI: Codex-on-Dell and Claude self-hosted sandboxes point to a real shift in where inference runs.
- Cost-aware reasoning: AutoTTS is the second paper this week to attack hand-tuned test-time budgets.
- Agent infrastructure: MCP servers continue to absorb the integration layer that custom SDKs used to occupy.