May 7, 2026 — Tandemly Briefings

🎯 Top 3 Things to Know

Anthropic's "Project Glasswing" is the biggest cyber-AI story of the year. At Code w/ Claude yesterday Anthropic disclosed that Claude Mythos Preview — an unreleased frontier model — has surfaced thousands of zero-day vulnerabilities across every major OS and browser, including a 27-year-old bug in OpenBSD. The model is too dangerous to ship: Anthropic is gating it to a small set of defenders (AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks) backed by $100M in usage credits. The same capability that patches will also exploit — that's the whole policy story now. (Anthropic Glasswing · Mythos card)

Code w/ Claude — what shipped, in one paragraph. Cloud Routines (templated agents triggered on schedule, GitHub events, or API call) and /ultrareview (parallel reviewer fleet for pre-merge bug-hunting) both moved out of preview. Pro/Max five-hour limits doubled. A SpaceX deal puts Anthropic on the Colossus data center. API traffic is up 17× year-over-year. (Anthropic routines blog · /ultrareview docs · Simon Willison live blog)

EU AI Act Omnibus crosses the finish line. Council and Parliament reached a provisional agreement today on the package that delays high-risk AI rules (flagged yesterday) and tightens transparency: providers of synthetic-content systems now have 3 months — not 6 — to implement labeling after general application starts. The AI Office gets clearer GPAI supervision authority, with carve-outs preserving national competence for law enforcement, border, judicial, and financial-sector use. (EU Council press release)

🚀 Frontier Models & Features

Anthropic — Code w/ Claude rate-limit doubling. The Pro/Max/Enterprise five-hour Claude Code window is now 2× larger at the same price. (release notes)

Google — Gemini 3.1 Flash-Lite. Efficiency-focused refresh: 2.5× faster response, 45% faster output, $0.25 / M input tokens. The mid-tier price war is now the loudest signal in the market. (llm-stats updates)

🔬 Research Worth Reading

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces (this week, arXiv preprint). Frames RL on the trace — the temporal interaction graph of spawn / delegate / call / aggregate / stop events — rather than on individual turns. Cites Moonshot's Kimi K2.5 Agent Swarm scaling Parallel-Agent RL to ~4,000 coordinated steps as the empirical anchor. Verdict: read full paper if you're building multi-agent harnesses; this is where the training signal is moving. (arXiv 2605.02801)

Architectural Design Decisions in AI Agent Harnesses (recent, arXiv). Empirical study of 70 public agent projects. Names recurring axes (action-space constraint, state management, permissioning, failure handling, resource control) and the patterns that recur. Less novel theory, more useful taxonomy. Verdict: skim — the figures and the typology table are the value. (arXiv 2604.18071)

🏢 Enterprise in the Wild

Customers Bank — OpenAI partnership and an AI-clone earnings call. The CEO ran an earnings call as an AI clone trained on his prior commentary, and the bank is now formalizing an OpenAI deal to automate finance workflows. The clone-on-the-call detail is the part worth watching: it normalizes synthetic-executive presence on regulated calls. (CNBC)

🛠️ Tooling & Ecosystem

⭐ ANTHROPIC — Cloud Routines. Templated Claude Code automations that run on Anthropic's web infrastructure, not your laptop. Triggered on schedule, GitHub events, or API call. Pro 5/day, Max 15/day, Team/Enterprise 25/day. The phone-continuity story for Claude Code is now real. Lift for your own harness: the trigger-set (cron + webhook + API) is the right minimum surface for any "agent that works while you sleep" — copy that taxonomy. (Anthropic blog)

⭐ ANTHROPIC — /ultrareview GA. Cloud-resident fleet of reviewer agents that surface and verify bugs before merge. Costs $5–$20 per run after the free trial. Notable as a concrete pricing precedent for "fleet of agents per task" billing. (docs)

Agent harness deep-dive — design-decisions taxonomy. See the arXiv paper above. Lift for your own harness: treat permissioning and failure-handling as first-class architectural axes — most projects only formalize tool-routing and state. (arXiv 2604.18071)

⚖️ Policy & Regulation

EU AI Act Omnibus provisional agreement (today). See Top 3. Affects every GPAI provider and any high-risk-AI deployer that was sequencing for the original August 2026 date. (Council release)

📌 Watch List

Cloud-resident agent fleets as a pricing model — /ultrareview's $5–$20 per run is the first mainstream example of per-task pricing for parallel agents. (/ultrareview docs)
Multi-agent training on orchestration traces — RL signal moves from per-turn to per-graph; expect more papers framing the trace as the unit of optimization. (arXiv 2605.02801)
Harness engineering as a named discipline — design-decisions taxonomy is consolidating; permissioning and failure-handling are the under-formalized axes. (arXiv 2604.18071)
Recursive Language Models — context-folding via REPL still the cleanest answer to long-context degradation; RLM-Qwen3-8B is the empirical anchor to watch. (arXiv 2512.24601)
World models / JEPA — LeWorldModel's stable end-to-end JEPA from pixels is the thread; planning ~48× faster than foundation-model-based world models on small compute. (arXiv 2603.19312)
Frontier-model dual-use disclosure — Project Glasswing is the new template: ship the capability behind a curated defender consortium rather than to the public API. (Anthropic Glasswing)