tandemly.ai
Briefing · JUN 4 2026

June 4, 2026

AI daily briefing

🎯 Top 3 Things to Know

1. Microsoft shipped its first openly competitive in-house models: MAI-Thinking-1 (reasoning) and MAI-Code-1-Flash (5B coding), both claiming to beat or match Claude on key benchmarks. The shift matters because Microsoft has spent five years buying OpenAI capacity. Now its own reasoner posts 97.0 percent on AIME 2025 and matches Claude Opus 4.6 on SWE-Bench Pro. The 5B Flash model beats Claude Haiku 4.5 on every coding benchmark Microsoft tested and solves the same problems with up to 60 percent fewer tokens. The audience that should care: anyone running a multi-vendor coding stack, anyone budgeting around Anthropic price-per-token, and anyone watching the cost floor for "good-enough" coding models drop. Worth re-running an internal coding eval against MAI-Code-1-Flash this week before assuming Haiku is still the cheap-and-fast default. Microsoft MAI announcement

2. Trump signed an executive order on June 2 asking AI companies to give the federal government 30 days of pre-release access to frontier models, on a voluntary basis. The order asks federal agencies to build a classified benchmark for "advanced cyber capabilities," and asks labs to opt in to a pre-release security review window. It explicitly rules out mandatory licensing. This is a softer regime than the EU is shaping but a marked reversal from the Trump administration's earlier hands-off posture. Affects every US frontier lab, every enterprise downstream of one, and every government contractor that resells these models. Worth watching: which labs publicly commit and which decline, because the participation list will become the de facto definition of "trusted frontier vendor" for federal procurement. The Register · White House EO text

3. Anthropic quadrupled Project Glasswing to roughly 200 partner organizations across 15-plus countries, after the initial 50 partners surfaced more than 10,000 high or critical security flaws. Cloudflare alone found 2,000 bugs in critical-path systems, 400 of them rated high or critical. Mozilla fixed 271 vulnerabilities in Firefox 150 with the tool, more than ten times the count of the prior release. The expansion adds power, water, healthcare, communications, and hardware sectors. This is the largest public evidence yet that an LLM-driven vulnerability scanner can run continuously against production codebases without drowning maintainers in false positives. Worth watching whether the figures hold up as the partner set diversifies away from sophisticated software shops. Anthropic · Help Net Security

🚀 Frontier Models & Features

🔬 Research Worth Reading

🏢 Enterprise in the Wild

🛠️ Tooling & Ecosystem

⚖️ Policy & Regulation

📌 Watch List