May 9, 2026 — Tandemly Briefings

🎯 Top 3 Things to Know

1. Anthropic, Blackstone, Hellman & Friedman, and Goldman Sachs launched a $1.5 billion AI services firm to embed Claude inside mid-sized companies. The standalone entity will deploy engineers directly into portfolio companies to redesign workflows around agents, starting with PE-owned businesses in healthcare, manufacturing, financial services, retail, and real estate. Anthropic, Blackstone, and H&F each contributed roughly $300 million; Goldman adds $150 million. The friction is implementation capacity. Frontier models keep outpacing what middle-market firms can absorb, and traditional consultants do not own the model. The structure mirrors Palantir's forward-deployment playbook with the model vendor on the inside, compressing the loop between deployment learning and model behavior. Worth watching whether OpenAI's parallel joint venture, announced the same day, targets the same segment. Anthropic announcement · CNBC

2. Snyk integrated Claude into its AI Security Platform, with red-teaming for prompt injection and runtime policy on agent tool calls. Snyk's Evo product, now powered by Claude, inventories AI assets across an organization (models, agents, MCP servers, datasets, third-party tools), red-teams running agents for prompt injection and data exfiltration, and enforces policy on tool calls before damage occurs. The numbers from Snyk's 2026 State of Agentic AI Adoption Report frame the urgency: 65 to 70 percent of production code is now AI-generated, and roughly half contains vulnerabilities. Each model an enterprise deploys pulls in nearly three times as many additional software components, 82 percent third-party. Traditional AppSec was not built for this surface. Available to joint customers now. Snyk announcement · Help Net Security

3. Google DeepMind's AI Co-Mathematician hit 48 percent on FrontierMath Tier 4, the hardest publicly tracked math benchmark. The system is a stateful workbench rather than a single model. Mathematicians work asynchronously with agents handling ideation, literature search, computational exploration, theorem proving, and theory building, with explicit tracking of failed hypotheses. In early tests it helped human researchers solve open problems and surface overlooked references. Tier 4 has been the wall most reasoning models barely scratch. A 48 percent score suggests the bottleneck has moved from raw reasoning to scaffolding: the long-running workspace, not the next model. arXiv 2605.06651

🚀 Frontier Models & Features

Anthropic Dreams for Managed Agents in research preview. A self-learning loop on the Claude Console that lets agents improve from prior run results. Outcomes, multiagent orchestration, and webhooks moved to public beta the same week. Anthropic platform release notes

Google Gemini 3.1 Flash Lite is generally available. Tuned for high-volume agentic workloads, translation, and cost-sensitive deployments. LLM Stats roundup

AWS Bedrock AgentCore added native payments via Privy. Agents can reason, act, and transact through wallet infrastructure built into the runtime, with payment intent scripted inside the agent loop rather than bolted on. SD Times weekly

🔬 Research Worth Reading

AI Co-Mathematician: Accelerating Mathematicians with Agentic AI (Google DeepMind). arXiv 2605.06651
- TL;DR: Frame frontier math research as an asynchronous workbench, not a chat. Agents track failed hypotheses, manage uncertainty, and emit native mathematical artifacts (proofs, code, theorems) that the mathematician edits in place.
- Stat: 48 percent on FrontierMath Tier 4, the hardest tier of the benchmark. Early external researchers report the system surfaced overlooked literature and contributed to open problems.
- Apply it: For agent products that support expert workflows (legal, scientific, engineering), stop optimizing single-turn answers. Add a stateful workspace with explicit hypothesis tracking and let the user edit the artifact, not the prompt.
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key (Purdue, UNC, Georgia Tech, UC San Diego). arXiv 2605.06638
- TL;DR: A controlled framework called ScaleLogic varies reasoning depth and the expressiveness of the underlying logic independently, so RL scaling laws can be measured cleanly.
- Stat: RL training compute scales as a power law in reasoning depth (T proportional to D^gamma, R-squared above 0.99). Gamma rises from 1.04 to 2.60 as logical expressiveness increases. More expressive curricula give up to 10.66 points of transfer gain.
- Apply it: When training or fine-tuning reasoning models, vary expressiveness alongside difficulty. Curricula that only stretch horizon length under-train models that more expressive curricula push much further at similar compute.

🏢 Enterprise in the Wild

Salesforce restructured as an agent-first platform with Headless 360. Every workflow, object, and business logic surface is now exposed through APIs, MCP tools, and CLI commands. Incumbent CRMs are repositioning as substrate for outside agents rather than destination apps. SD Times

Cognizant launched Secure AI Services. Bundles secure agent development, production behavior monitoring, identity and access for agents, audit evidence, and generative AI risk management. Same governance gap Snyk's report names: 72 percent of enterprises report agentic AI in production, but 60 percent have no governance layer. SD Times

Tether AI released QVAC MedPsy, an open-source on-device clinical model. The 4B variant outperforms MedGemma 27B on clinical benchmarks. A 4B medical model is small enough to run inside hospital firewalls without round-tripping patient data to a vendor cloud. SD Times

🛠️ Tooling & Ecosystem

Salesforce Data 360 MCP Server (Developer Preview). A facade-tool architecture exposes roughly 200 Salesforce API operations to coding agents and assistants without flooding the context window. Single-org, single-user during preview. Salesforce Developers

Lovable MCP Server (Research Preview, May 7). Lets developers create, iterate, and deploy Lovable apps from a terminal or AI agent over MCP. Lovable announcement

Chrome DevTools MCP (Public Preview). Exposes DevTools debugging and performance instrumentation to AI coding assistants. The agent can now reproduce a bug in a real Chrome tab and read its own network and runtime traces. Chrome blog

Coder Agents in beta. Native agent architecture for self-hosted enterprise infrastructure, for teams that cannot send code to vendor clouds. SD Times

⚖️ Policy & Regulation

EU AI Act Omnibus, follow-on detail. Beyond the high-risk delays already reported, the May 7 deal extends SME treatment (simplified technical documentation, lighter conformance) to small mid-caps up to 500 employees, and shrinks the grace period for AI-generated content transparency from six to three months (new deadline December 2, 2026). Council release

📌 Watch List

Forward-deployed AI services: Anthropic and OpenAI launched joint ventures the same day, both targeting mid-market implementation capacity.
AI-generated code security: Snyk's 65 to 70 percent figure puts a number on a blind spot most AppSec stacks miss.
Stateful agent workspaces: AI Co-Mathematician suggests the next lift comes from scaffolding, not the model.
On-device clinical models: QVAC MedPsy at 4B is small enough for hospital firewalls.
Agent payments: AgentCore + Privy puts payment primitives inside the agent runtime.