🎯 Top 3 Things to Know
1. Google I/O dropped Gemini 3.5 Flash, the Gemini Spark agent, and Omni, a world model. Gemini 3.5 Flash slots in as the default model in the Gemini app, with Google pitching it at roughly a third of the cost of comparable frontier models. Spark is a long-running personal agent that lives on Google Cloud VMs and reaches across Workspace and the open web. Omni is a video-and-physics world model. The combined message is that Google is willing to compete on price at the small end while pushing autonomy at the high end. Teams running multimodal pipelines should benchmark 3.5 Flash on cost per quality this week, especially for high-volume document and video workloads where a single model could replace a stitched pipeline. CNBC · Google Developers Blog
2. Anthropic shipped self-hosted sandboxes and MCP tunnels for Claude Managed Agents. Sandboxes move tool execution off Anthropic's infrastructure and onto a company's own environment, or a managed runner like Cloudflare, Daytona, Modal, or Vercel. The agent loop itself still runs on Anthropic. MCP tunnels open a single outbound, end-to-end encrypted connection from an internal network out to the agent, so internal databases and ticketing systems become tools without a public endpoint. This is the answer to the most common reason agent pilots stall in regulated industries: the security team won't let execution or data leave the perimeter. Worth re-pricing any agent project that died last quarter on a security review. Anthropic blog · InfoQ
3. The EU and Council reached a provisional deal to delay big chunks of the AI Act and add two new prohibitions. Use-based high-risk obligations (Annex III) slip from August 2026 to December 2027. Product-regulated high-risk (Annex I) slips from August 2027 to August 2028. Member states get an extra year to stand up regulatory sandboxes. The deal also adds two outright bans: AI used to generate non-consensual intimate material and AI used to generate CSAM. The headline is timeline relief, but the new bans take effect on the original schedule, so anyone shipping image or video generation into the EU has work to do regardless of the delay. Council press release · Covington analysis
🚀 Frontier Models & Features
- OpenAI launched a self-serve advertising platform inside ChatGPT. Advertisers can buy on a CPM or CPC basis and plug in through agency holding companies and ad-tech firms including Dentsu, Omnicom, Publicis, WPP, Adobe, and Criteo. OpenAI has reportedly told investors it is targeting $2.5B in ad revenue this year. The free-tier business model question for chat assistants is now settled in one direction. TechCrunch coverage (search OpenAI advertising May 21).
- Grok 4.25 is now in wide release on X Premium and via the xAI API, billed as a reasoning-and-hallucination upgrade over the April 4.20 beta. No new benchmark numbers from xAI yet. llm-stats.
🔬 Research Worth Reading
- Automated Interpretability and Feature Discovery in Language Models with Agents (Marin-Llobet, Ferrando / Harvard). arXiv
- TL;DR: Two coupled agent loops do mechanistic interpretability for you. One refines competing hypotheses about what a feature does; the other roams the activation space with a k-nearest-neighbor graph to find features worth examining in the first place.
- Stat: On Gemma-2 and weight-sparse transformer MLP neurons, the agent improves over one-shot auto-interpretation and surfaces language-specific and safety-relevant features with auditable explanation traces.
- Apply it: If you currently rely on sparse-autoencoder dashboards and human spot-checking to make sense of internal features, swap in an agent loop for the discovery step this week and compare which features it surfaces against your existing curated list.
🏢 Enterprise in the Wild
At Knowledge 2026, ServiceNow customers reported concrete deflection numbers from in-production AI specialists. Docusign is targeting autonomous resolution of 90% of internal IT tickets. Honeywell reports its AI assistant has eliminated the majority of service desk conversations. The city of Raleigh reports a 98% deflection rate on employee requests, the equivalent of a month of staff time saved. Across ServiceNow's customer base, AI specialists now resolve 91% of cases without reassignment. The pattern: governed, narrow agents inside a workflow system are clearing real ticket volume, not just answering FAQ in a chatbox. Fortune · ServiceNow newsroom
🛠️ Tooling & Ecosystem
- Anthropic Claude Managed Agents added self-hosted sandboxes (public beta) and MCP tunnels (research preview). Covered as #2 above. Anthropic blog.
- Google made over 85 I/O sessions, codelabs, and updates available on demand as of today, including new APIs around the Spark agent runtime and the Omni world model. Useful for teams evaluating Gemini as a build-on platform. Google Developers Blog.
⚖️ Policy & Regulation
- EU AI Act Digital Omnibus provisional deal: most high-risk compliance dates slip by 12 to 16 months, with two new prohibitions (non-consensual intimate imagery, CSAM) layered in. Covered as #3 above. Council press release.
- Colorado AI Act enforcement was stayed by the U.S. District Court for Colorado on April 27. A replacement bill, SB 26-189, passed both chambers in early May and was sent to the governor, who is expected to sign before adjournment. Companies that built compliance programs around the original Colorado AI Act timeline should re-read the replacement before turning anything off. The Employer Report.
- California SB 243 (companion chatbots) has been in force since January and requires AI-companion operators to disclose AI status and remind users every three hours, with extra obligations for minors. Worth a fresh read for any product whose chat surface holds long sessions with users. Jones Walker.
📌 Watch List
- Cost-per-token competition at the small end (Gemini 3.5 Flash, GPT-5.5 Instant).
- Long-running personal agents on cloud VMs (Spark, ChatGPT agent mode, Claude Managed Agents).
- World models for physics and video (Omni, follow-on releases expected at SIGGRAPH).
- Agent governance and observability inside workflow systems (ServiceNow AI Control Tower, Microsoft Agent 365).
- Mechanistic interpretability moving from research dashboards to automated loops.