May 25, 2026 — Tandemly Briefings

🎯 Top 3 Things to Know

1. Pope Leo XIV releases his first encyclical on AI, sharing the Synod Hall stage with Anthropic's Chris Olah. Magnifica Humanitas, signed on May 15 and presented at the Vatican this morning, frames artificial intelligence as the industrial revolution of this century and argues for protections of "the human person" as systems become more capable. The choice of co-presenter matters as much as the text. The Vatican picked Anthropic's head of interpretability, not a Google or OpenAI representative, signaling which lab the Church considers a credible partner on safety. The encyclical lands the same week Anthropic projects its first quarterly operating profit and closes a $30 billion round, so the institutional positioning has commercial weight behind it. Worth reading in full once the official translation appears, especially the sections on labor and human dignity that will likely be cited in EU and US policy debates over the next year. Vatican News

2. An OpenAI internal model disproved an 80-year-old Erdős conjecture in discrete geometry, and Tim Gowers says he would have accepted the proof to the Annals of Mathematics without hesitation. The unit-distance conjecture, posed by Erdős in 1946, claimed that a skewed square grid was close to optimal for packing point-pairs exactly one unit apart. The model produced an infinite family of constructions using Golod-Shafarevich theory and class field towers that beat the grid by a polynomial factor. Nine external mathematicians, including Noga Alon and Melanie Wood, verified the result in a companion paper. This is the first time a prominent open problem in a subfield of mathematics has been solved autonomously by a general-purpose model, not a specialized theorem prover. The result is preliminary in one important sense. Mathematicians are still unpacking how much of the proof reflects genuine search through algebraic structure versus retrieval of relevant prior work the model had seen. Either way, the bar for "AI-assisted" versus "AI-authored" mathematics just moved. OpenAI

3. Anthropic acquired Stainless for over $300 million, then wound down the hosted SDK products that OpenAI, Google, Meta, and Cloudflare all relied on. Stainless turned OpenAPI specs into production SDKs across Python, TypeScript, Go, Java, and more. Anthropic now controls that pipeline and has closed it to new signups. Competitors keep ownership of SDKs already generated but face a build-or-switch decision at contract renewal, with Speakeasy and LibLab as the obvious alternatives. The strategic frame Anthropic put on the deal is that agents acting across APIs depend on high-quality client libraries and MCP servers, so owning the generator is owning the on-ramp. Worth tracking how OpenAI and Google respond, since rebuilding SDK tooling at scale is a real cost center. Anthropic

🚀 Frontier Models & Features

Salesforce Agentforce Coworker (beta) is live. Marc Benioff announced the rollout on May 21. Coworker embeds an AI teammate into Salesforce Global Search, letting users query CRM data in plain language and trigger actions without copy-paste between tabs. Salesforce Ben
Gemini Spark gets third-party MCP support within weeks. Google confirmed at I/O that Spark, the $100/month always-on agent announced May 19, will reach Canva, Instacart, and OpenTable via MCP. Canva is already live for Ultra subscribers; Adobe and CapCut follow. TechCrunch

🔬 Research Worth Reading

Interpretability Can Be Actionable (Orgad, Barez, Haklay, Lee, Mosbach, Reusch, Saphra, Wallace, Wiegreffe, Wong, Tenney, Geva et al.). arXiv
- TL;DR: A position paper arguing the field should stop measuring interpretability by elegance of internal explanations and start measuring it by whether the insight enables a concrete intervention.
- Stat: The authors identify five domains where actionable interpretability has unique leverage beyond what behavioral evaluation alone can provide, and argue most current work falls short on at least one of concreteness or validation.
- Apply it: Before approving an interpretability finding for production work, write down the specific intervention it enables. If you cannot, treat the finding as exploratory and do not let it drive decisions about model behavior.
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG (Zheng, Worring / University of Amsterdam). arXiv
- TL;DR: Run agentic RAG's thoughts and subqueries as latent tokens in hidden state, not as autoregressive text the model has to write out, which removes most of the iteration latency.
- Stat: Agentic RAG's latency overhead is dominated by autoregressive generation of lengthy thoughts and subqueries; LatentRAG produces them in a single forward pass.
- Apply it: If your agentic RAG loop spends most of its wall-clock budget on chain-of-thought tokens rather than retrieval, try collapsing the thought step into hidden-state computation and measure tokens-saved versus accuracy-lost.
Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict (Chen, Qian, Wang, Zhang et al.). arXiv
- TL;DR: Introduces Context-Driven Decomposition, an inference-time probe for detecting when a RAG system is being "obedient" to retrieved context that conflicts with what the model actually knows.
- Stat: The probe operates as both a diagnostic and an intervention mechanism for controlled retrieval-versus-parametric conflict.
- Apply it: Add a context-compliance metric to your retrieval evals. Inject deliberately wrong retrieved passages and measure how often the model defers to them anyway. Anything above your tolerance threshold is a production risk.

🏢 Enterprise in the Wild

JPMorgan now runs 450-plus AI agent use cases in production. Investment banking pitch decks that took junior analysts hours are generated in roughly 30 seconds. Trade settlement and fraud detection are largely automated. The bank is the most visible reference point for what "enterprise-wide" agent deployment looks like at a regulated financial institution, ahead of broader industry rollouts in 2026. Kore.ai analysis

OpenAI launched the OpenAI Deployment Company, a $4 billion entity majority-owned by OpenAI in partnership with 19 global integrators and consultancies, and acquired the applied-AI firm Tomoro to staff it with 150 forward-deployed engineers. The pitch is end-to-end deployment for Fortune 500s, from workflow redesign to durable production systems. PYMNTS

🛠️ Tooling & Ecosystem

TanStack npm supply-chain attack postmortem. The May 11 incident, attributed to TeamPCP, exploited the pull_request_target "Pwn Request" pattern and GitHub Actions cache poisoning to publish 84 malicious package versions via TanStack's legitimate release pipeline. It is the first documented case of a malicious npm package carrying valid SLSA provenance. Worm payload spread to 170-plus packages, harvested credentials from 100-plus file paths, and installed persistence hooks in Claude Code and VS Code. Worth a full read by anyone running CI for an open-source package. TanStack postmortem

⚖️ Policy & Regulation

The White House postponed Trump's executive order that would have created a voluntary 90-day federal review of frontier models before public release. Trump said it "gets in the way" and reiterated the US-China lead framing. The order had been negotiated with OpenAI and Anthropic; what replaces it is unclear, leaving the pre-release government-access question open. NPR

The EU AI Act Omnibus, agreed by Council and Parliament on May 7, defers high-risk AI obligations from August 2026 to December 2027 (a 16-month slip) and product-regulated systems by a further year. Two new prohibitions take effect December 2026: AI-generated non-consensual intimate imagery and CSAM. The compliance breathing room is real, but the substantive prohibitions are not. Global Policy Watch

📌 Watch List

Chinese models passed 60% of token consumption on OpenRouter, led by Tencent Hy3 preview and DeepSeek-V4-Flash.
Budget-aware agent routing: several recent papers converging on adaptive cheap-vs-expensive model selection per step.
Long-context safety: refusal mechanisms degrade unpredictably past 100K tokens in agentic settings.
Mathematics-by-model: how the field absorbs the Erdős result will set norms for AI authorship in proofs.