tandemly.ai
Briefing · MAY 2 2026

May 2, 2026

AI daily briefing

Saturday — quieter day. Five strong items below; the cyber-eval thread and the new agent-cost paper are the ones to actually read.

🎯 Top 3 Things to Know

1. AISI publishes Mythos cyber evaluation — and a parallel one for GPT-5.5. The UK AI Security Institute's writeups reframe the story: it's not "one model is dangerous," it's "frontier offensive cyber capability is here across labs." Mythos was the first to clear AISI's 32-step "The Last Ones" corporate-network takeover end-to-end (~20 human hours); GPT-5.5 reportedly matches it. The policy question stops being "should this one model ship" and becomes "what's the across-lab posture." AISI on Mythos · AISI on GPT-5.5 · Anthropic preview page

2. First systematic study of agent token spend lands on arXiv. "How Do AI Agents Spend Your Money?" measures eight frontier LLMs on SWE-bench Verified. Five takeaways: agentic coding is orders of magnitude more expensive than chat, input tokens dominate even with caching, the same task varies by 30× across runs, more spend ≠ more accuracy (it peaks mid-budget then degrades), and models are bad at predicting their own cost. The empirical baseline cost-aware design has been waiting for. arXiv:2604.22750

3. Federal CIO signals caution on Mythos rollout despite Project Glasswing. The federal CIO is reportedly cautious on broader Mythos adoption inside US government even as Anthropic onboards Glasswing partners (Amazon, Apple, Cisco, Microsoft, Palo Alto Networks, Linux Foundation). First real test of "release dangerous-capability models only to defenders" as a doctrine. CyberScoop · Anthropic Glasswing


🚀 Frontier Models & Features


🔬 Research Worth Reading


🏢 Enterprise in the Wild


🛠️ Tooling & Ecosystem


⚖️ Policy & Regulation


📌 Watch List