tandemly.ai
Briefing · MAY 17 2026

May 17, 2026

AI daily briefing

🎯 Top 3 Things to Know

1. Anthropic published a research paper this weekend forecasting AGI by 2028 and arguing for tighter U.S. chip and model export controls on China. The paper landed during Trump's two-day Beijing visit and sketches two scenarios for the next three years. The headline is not the AGI date itself. It is the policy lever: in Anthropic's second scenario, looser export discipline lets Chinese labs close the capability gap by 2028 and erases the leverage Washington has been using to shape global AI norms. The argument is unusual because it comes from a model lab rather than a policy shop, and it commits Anthropic publicly to a hawkish position the same week the White House is reportedly weighing a separate pre-release vetting regime for frontier models. Worth watching whether OpenAI and Google DeepMind respond on the record, since compute and model-access restrictions cut both ways for them. Anthropic research

2. A new paper finds that grep beats vector retrieval inside the major coding agents, but the harness around the search tool moves accuracy more than the choice of retrieval method. "Is Grep All You Need?" tests four agent harnesses on a 116-question slice of LongMemEval. Claude Code holds a persistent grep advantage with Opus and Haiku. Gemini CLI holds a persistent vector advantage with Gemini 3.1 Pro. Same data, different harnesses, different winners. Vector search tends to win at small context, when the bundle is still manageable. Grep tends to win later, when the agent has to separate needle from haystack in a noisy context window. Relevant for anyone running RAG inside an agent loop. Production teams should benchmark grep against their vector setup before assuming the embedding pipeline is the thing doing the work. arXiv 2605.15184

3. Thinking Machines previewed "interaction models," a native multimodal architecture aimed at OpenAI's Realtime stack. Mira Murati's lab claims a 0.4-second average response latency against GPT-Realtime-2.0's 1.18 seconds. The architecture replaces the request-response loop with 200ms micro-turns and splits the system in two: a live interaction model that stays open to the user while a background reasoning model runs tools asynchronously and shares full conversation context. The bet is that real-time voice and video collaboration is not a faster chatbot. It is a different system topology, where listening, speaking, seeing, and pausing are trained behaviors rather than stitched components. A limited research preview is open now for feedback. Worth watching whether the latency advantage holds in adversarial conditions like overlapping speech and noisy audio. Semafor coverage

🚀 Frontier Models & Features

🔬 Research Worth Reading

🏢 Enterprise in the Wild

🛠️ Tooling & Ecosystem

⚖️ Policy & Regulation

📌 Watch List