May 26, 2026 — Tandemly Briefings

🎯 Top 3 Things to Know

1. Gemini Spark opens to US Google AI Ultra subscribers this week, the first broad consumer test of a persistent personal agent. Spark runs continuously on dedicated cloud VMs, holds task state across days, and ships with MCP connections to Canva, OpenTable, and Instacart at launch. Adobe, Spotify, GitHub, Notion, and Slack are queued for summer. The pricing matters as much as the feature set: Google cut the Ultra plan from $250 to $100 a month to put Spark in front of more wallets. The first real signal will be connector usage and churn over the next 60 days. If Spark holds onto its scheduled-task users instead of being treated like a faster chatbot, the always-on agent category gets validated. EU and UK availability is gated on AI Act transparency compliance and is not expected before Q3. TechCrunch

2. Microsoft and Anthropic are in late-stage talks for Anthropic to run inference on Microsoft's Maia 200 chips. Microsoft's $5 billion investment in November came bundled with a $30 billion Azure commit. The chip deal would convert part of that compute into Microsoft silicon rather than Nvidia. Nadella said in April that Maia 200 offers "over 30% improved tokens per dollar" against current Azure inventory. For Anthropic, this is a hedge against the Nvidia supply chain that already gates its SpaceX-Colossus contract. For Microsoft, it's the first credible Maia customer outside its own first-party workloads, the gating step before Azure can sell Maia at scale. The deal is not closed. Watch for confirmation language in either company's next quarterly call. CNBC

3. Intuit cut 17% of its workforce, then signed multi-year deals with Anthropic and OpenAI in the same week. The 3,000 layoffs land July 31. CEO Sasan Goodarzi told CNBC the cut was about flattening management layers after the Credit Karma and TurboTax integration, "nothing to do with AI." The new contracts make Claude and ChatGPT first-class surfaces for Intuit's tax, accounting, and personal-finance flows, with Intuit's data feeding back into both. The pattern is now familiar: enterprise software companies are restructuring around the assumption that a smaller team plus model APIs ships the same roadmap. Intuit's projected revenue lift to $21.34B confirms management is selling this as growth, not cost reduction. TechCrunch

🚀 Frontier Models & Features

Salesforce Agentforce Coworker hits general availability. Following last week's beta, Coworker is now live for all Agentforce customers on Enterprise, Unlimited, and Agentforce 1 editions as part of the Summer '26 release. Plain-language CRM queries and actions surface in Salesforce, Slack, Teams, and ChatGPT. Salesforce Ben
NVIDIA GTC Taipei keynote previews Vera CPUs. Jensen Huang takes the Taipei Music Center stage on June 1, with the Vera CPU (claimed 1.5x faster than x86 alternatives for agentic workloads) the headline hardware reveal. NVIDIA blog

🔬 Research Worth Reading

Compute Where it Counts: Self Optimizing Language Models (Akhauri, Abdelfattah / Cornell). arXiv
- TL;DR: Pair a frozen LLM with a tiny policy network that reads hidden state at each decode step and picks an efficiency action (attention sparsity, structured MLP pruning, activation quantization bit-width). The model decides per-token how much compute to spend.
- Stat: SOL improves MMLU accuracy by up to 7.3 points over uniform-budget allocation at matched FLOPs, and finds a better quality-efficiency Pareto front than static methods.
- Apply it: If you serve a fixed model and want headroom without retraining, prototype a learned per-token compute policy. Start with one knob and measure tokens-per-quality versus your current static config.
Beyond Scaling: Agents Are Heading to the Edge (Tian, Cai, Zhao, Lane). arXiv
- TL;DR: A position paper arguing small models running close to data already pass the operational bar for most personal-agent use cases, and that benchmark performance hides the thermal, memory, and coordination overhead of multi-step agent execution.
- Stat: The authors frame agency as "architectural proximity" and identify KV-cache multiplexing, local speculative decoding, and shared-memory routing as the three open hardware-software co-design problems for edge agents.
- Apply it: Audit your agent pipeline for round-trips that exist only because the model lives in the cloud. Any step driven by local sensor data, user files, or system logs is a candidate for an on-device model that doesn't need to be at the frontier.
What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit (Naser Moghadasi, Ghaderi / Texas Tech, UT Arlington). arXiv
- TL;DR: Reads twelve canonical LLM agent benchmark papers against a five-field disclosure schema (benchmark identity, harness spec, inference settings, cost reporting, failure breakdown) and scores how much of the evaluation can actually be reconstructed.
- Stat: Most papers disclose model name and topline score, but inference settings and cost-per-task are routinely absent, which the authors argue explains why papers reporting the same benchmark with the same model disagree.
- Apply it: Before quoting a benchmark number, check whether the source paper discloses temperature, max tokens, retry policy, and per-task cost. If any are missing, treat the number as directional, not comparative.

🏢 Enterprise in the Wild

DeepSeek hired a former Jane Street engineer to lead a new "AI harness" team in Hangzhou, building the deterministic scaffolding to turn DeepSeek V4 into revenue-generating agents. The hire signals DeepSeek is moving past the model-release cadence to compete on agent infrastructure, where Anthropic's Managed Agents and OpenAI's Frontier already have enterprise traction. The New Stack

🛠️ Tooling & Ecosystem

Microsoft 365 Copilot adds GPT-5.5 Thinking to Copilot Chat and ships agentic browsing into the enterprise stack, letting Copilot navigate websites, fill forms, and complete multi-step tasks inside the managed-tenant boundary. Microsoft Learn

⚖️ Policy & Regulation

The European Commission opened consultation on draft guidelines for classifying high-risk AI systems on May 19, with feedback due before the EU AI Act Omnibus high-risk obligations land in December 2027. The transparency obligations under Article 50 take effect August 2, 2026, on the original schedule. Vendors deploying to EU users in Q3 should treat August as a real deadline, not a deferred one. European Commission

📌 Watch List

Edge-native agents: hardware-software co-design as the next bottleneck after model quality.
Per-token adaptive compute: SOL and similar work pointing to learned efficiency policies as deployable today.
Benchmark disclosure: a small group of papers arguing the reproducibility problem is methodological, not technical.
Anthropic compute diversification: Maia 200 talks alongside Colossus suggest a deliberate move away from Nvidia exposure.
Always-on agents: Spark and Anthropic Cowork as the first two products in the persistent personal-agent category.