🎯 Top 3 Things to Know
1. Anthropic split the Managed Agents control plane from the execution plane, shipping self-hosted sandboxes and MCP tunnels at Code with Claude. The orchestration loop (context management, error recovery, multi-step planning) stays on Anthropic's infrastructure. Tool execution moves to a sandbox the customer configures, either on its own cloud or on a managed provider like Cloudflare, Daytona, Modal, or Vercel. MCP Tunnels (research preview) then lets the agent reach private Model Context Protocol servers without exposing them to the public internet. The split addresses the recurring enterprise objection that hosted agents could not touch internal systems without breaking compliance perimeters. Teams that have stalled at proof-of-concept on data residency grounds now have a path forward. The number worth watching is sandbox cold-start latency on customer infrastructure versus Anthropic's own, since that is where the hybrid model's friction will show up first. Anthropic
2. Google researchers released BATS, a budget-aware framework that cuts AI agent tool-call costs by roughly a third without losing accuracy. The framework wraps a foundation model in a Budget Tracker, a prompt-level module that signals remaining token and tool-call budget on every step, then trains the agent to dig deeper or pivot based on resources left. On a deep-research benchmark, BATS hit comparable accuracy with 40.4% fewer search calls, 19.9% fewer browse calls, and 31.3% lower total cost. The piece comes weeks after an IDC survey reported that 92% of decision-makers said their deployed agents cost more than expected, with inference the dominant driver. Teams building search or browse agents on metered APIs should benchmark a budget-aware loop against their current fixed-effort loop before raising another infrastructure round. The cheapest win in the agent stack is no longer model selection; it is letting the agent know how much money is left. VentureBeat · arXiv
3. The EU's Digital Omnibus on AI postpones high-risk AI obligations to December 2027, the first amendments to the AI Act since adoption. Negotiators from the Council, Parliament, and Commission reached provisional agreement on May 7. High-Risk AI System obligations slip from August 2026 to December 2027. National regulatory sandboxes slip a year. Transparency rules still come into force in August 2026. The package also adds two new prohibitions: AI-generated non-consensual intimate imagery, and realistic depictions of identifiable people without consent. The delay reflects how unprepared standards bodies and notified-body capacity were for the original timeline. Vendors that were on a forced-march compliance plan now have eighteen extra months. Buyers that priced compliance into 2027 procurement should expect the value of the audit-readiness premium to soften. Council of the EU
🚀 Frontier Models & Features
- OpenAI rolled compliance updates into ChatGPT Enterprise/EDU Skills, adding a dedicated admin page, workspace-scoped permissions, upload scanning with risk review and blocking, and Compliance Logs Platform support. Codex CLI shipped improved MCP profile handling, more reliable tool schemas, and clearer hook context. OpenAI
- Anthropic also raised Claude Code and Opus API rate limits at the Code with Claude events in San Francisco and London, alongside the Managed Agents updates above. Anthropic
🔬 Research Worth Reading
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation (Lin, Li, Song, Jiang & Zhang — see arXiv link for affiliations). arXiv
- TL;DR: Treats skills as the agent's primary unit of capability and gives them a full lifecycle: an agent creates new skills on demand, stores them in a skill memory, organizes and selects from them at runtime, and refines each one against unit tests plus runtime feedback. The lifecycle replaces the usual one-shot tool-list approach.
- Stat: On the GAIA benchmark, the unified lifecycle lifts task success by a double-digit margin over baselines that only create or only reuse skills, while keeping a skill library small enough for prompt-time selection.
- Apply it: If your agent stack accumulates ad-hoc tools without a way to delete or rewrite them, run a one-week experiment that gates every new skill behind a unit test and a memory entry. The test gate is the part most teams skip.
Budget-Aware Tool-Use Enables Effective Agent Scaling (authors — see arXiv link / Google Research). arXiv
- TL;DR: Introduces the Budget Tracker prompt module and BATS test-time scaling on top of it, training agents to allocate remaining search and browse budget across exploration and exploitation rather than running fixed-effort loops.
- Stat: 40.4% fewer search calls, 19.9% fewer browse calls, 31.3% lower total cost at matched accuracy on deep-research tasks.
- Apply it: Add a remaining-budget line to your agent system prompt this week and instrument the per-step decision rate of stop versus continue. The instrumentation alone tends to surface the cheap pivots before you train anything.
🏢 Enterprise in the Wild
Sysco won Newsweek's 2026 AI Impact Award for its Sysco Agentic Ecosystem (SAGE), a company-wide platform that moved AI out of pilots and into daily operations across sales, supply chain, customer experience, and back office. The relevant detail for other food-and-distribution operators: SAGE was built as a shared agent runtime first, with use cases layered on top, rather than as a set of disconnected vertical agents. GlobeNewswire
🛠️ Tooling & Ecosystem
- Microsoft released RAMPART, a pytest-native safety and security testing framework for agentic AI, and Clarity, a structured design-review tool that documents intent, risks, and behavior before code ships. Both target the gap between traditional unit testing and agent-specific failure modes. Microsoft Tech Community
- Cohere released Command A+ under Apache 2.0 on May 20, the first major Command-family release usable commercially without a separate Cohere license. Cohere
⚖️ Policy & Regulation
Beyond the EU Digital Omnibus covered above, the new EU prohibitions on non-consensual AI imagery establish a direct content-liability hook that vendors of image and video models will need to surface in their model cards. The timeline relief on high-risk obligations does not extend to these prohibitions, which take effect on the original schedule. Inside Privacy
📌 Watch List
- Hybrid agent architectures: control plane hosted, execution plane in customer cloud.
- Budget-aware reasoning: cost-side test-time scaling moves from paper to production.
- Skill lifecycle as a first-class agent primitive: creation, memory, eval, refinement.
- EU AI Act compliance timelines softening: vendor and buyer plans both reshuffle.
- Enterprise agent runtimes as shared infrastructure rather than per-use-case builds.