Welcome to Blank Metal’s Weekly AI Headlines.
Each week, our team shares the AI stories that caught our attention—the articles, announcements, and insights we’re actually discussing internally. We curate the best of what we’re reading and add the context that matters: what happened, why it matters, and what to do about it.
The Workspace Wars Escalate
Fifteen days after Claude Cowork went GA, OpenAI, Adobe, Salesforce, and Google all shipped workspace-layer moves in a single week. The category isn’t “who has the best chat model” anymore—it’s “whose workspace runs your agents, your skills, and your governance.” If you’re planning an AI rollout for anyone other than engineers, this is the layer that matters, and every incumbent platform you already pay for is quietly repositioning to defend turf in it.
OpenAI Ships Workspace Agents in ChatGPT—The Cowork Category Is Now a Two-Vendor Race
What: OpenAI launched Workspace Agents inside ChatGPT, a goal-driven, multi-step agent surface that reads across connected tools, plans work, and delivers finished artifacts. It lands 15 days after Anthropic took Claude Cowork out of preview, and draws directly on Codex infrastructure for the execution layer.
So What: Until last week, Anthropic owned the “workspace where AI does the work” category on its own. That’s over. Every enterprise AI conversation now has two credible Cowork-class products from the two labs most buyers are already paying, and the vendor choice collapses into a handful of real variables: connector catalog, skills format portability, admin controls, and which model your people are already using. The fact that OpenAI built on Codex rather than a clean-sheet agent runtime is also worth noting—it signals the coding-agent substrate and the workspace-agent substrate are the same product underneath.
Now What: If you’ve already committed to Claude Cowork, don’t switch—but build your governance (RBAC, connector permissions, skills architecture) in a platform-agnostic way so you can run both where it makes sense. If you haven’t committed yet, this is the moment to pilot both side-by-side against two or three of your actual workflows and decide on evidence, not on vendor preference. The category-defining feature six months from now will be skills and agent portability, not necessarily the underlying model.
Adobe Goes MCP-Native at Summit 2026—And Legacy Enterprise Platforms Just Got Interesting Again
What: Adobe announced CX Enterprise at Summit 2026: an end-to-end agentic customer-experience platform built around AI agents, reusable “agent skills,” and MCP endpoints, with a governance layer on top. Adobe Marketing Agent will appear inside Claude Enterprise, ChatGPT Enterprise, Gemini Enterprise, Copilot, and IBM watsonx Orchestrate. A new “CX Enterprise Coworker” takes a business goal (”increase cross-sell by 3%”), assembles agents, plans, and executes pending human approval.
So What: Two things to notice. First, MCP is now a first-class citizen inside a legacy enterprise pitch, not a developer curiosity—Adobe is betting that portable agent standards are how incumbent platforms stay relevant as the agent layer commoditizes. Second, the retrofit-versus-reengineer debate inside every enterprise just got a template: Adobe kept AEP as the contextual layer and wrapped agents around it rather than rebuilding. That’s the pattern most of you will end up following.
Now What: If you run a legacy platform of record—CRM, ERP, marketing, finance—stop waiting for the vendor to ship a “real” AI strategy. Start asking now whether they’ll expose MCP endpoints, whether their agents will run inside Claude Enterprise or ChatGPT Enterprise, and whether their skills are portable across your agent runtimes. A vendor that can’t answer those three questions by end of Q3 is a vendor you’re going to replace.
Salesforce Launches Headless 360—Your Platform of Record Is Now Infrastructure for Agents
What: Salesforce unveiled Headless 360, which exposes the entire Salesforce platform as infrastructure for AI agents: data, business logic, workflows, and policy all available programmatically to any agent runtime, any model, any orchestration layer. It’s the first major CRM repositioning itself not as a destination app but as a system of record agents operate on top of.
So What: This reframes the most expensive software purchase in most enterprises. If Salesforce is infrastructure, then the value question moves from “which CRM do we pick” to “what agents sit on top of it and who controls them”—and the answer to that second question is increasingly you, not Salesforce. The deeper signal is that the incumbents have now absorbed the agent thesis: they’re not fighting it, they’re repositioning around it. Expect the same move from ServiceNow, Workday, Oracle, and SAP over the next six months.
Now What: If you’re a Salesforce customer, get ahead of this. Ask your account team where Headless 360 fits in your license, what the governance model looks like across multiple agent runtimes, and how skills and agents built against your instance survive a vendor change. If you’re evaluating CRM alternatives, the new decision criterion is: which platform will be easier to operate on top of a year from now.
Gemini Gets a Next-Generation Deep Research Agent—Research-as-Workflow, Not Research-as-Search
What: Google launched a next-generation Deep Research agent inside Gemini. It runs multi-hour investigations across the open web, synthesizes findings into structured reports, and interleaves reasoning, citations, and cross-checks instead of returning a ranked list of links.
So What: This is the first credible move from Google that positions Gemini as more than a search box with a model attached. Deep Research is a workflow product, not an answer product—the same architectural bet Claude and ChatGPT made with their respective research and agent modes. For enterprise buyers, it also forces a real choice: if your analysts start using Deep Research for diligence, market scans, or regulatory reviews, you need governance around it before it becomes the de facto research tool on your team.
Now What: If you have analysts, researchers, or consultants spending hours per week on web-synthesis work, pilot Deep Research against one of them for a week and measure the delta. If the gains are real, your next question is governance: source control, citation audit, data residency, and whether the research output can be trusted in a regulated workflow. Don’t let this diffuse through your org ungoverned—treat it like you’d treat any new research tool with internet access.
The Model Race: Coding and Life Sciences
The frontier model race kept moving on two fronts this week. Google publicly conceded Anthropic is ahead on coding and stood up a strike team to catch up. Moonshot’s open-weights Kimi K2.6 put a credible open model inside the frontier envelope for the first time. And OpenAI shipped the first vertical frontier model—GPT-Rosalind for life sciences—with named pharma customers. Two signals for enterprise buyers: vendor leadership swaps faster than your procurement cycle, and vertical frontier models are the next GTM pattern.
Google DeepMind Spins Up a Strike Team to Close the Coding Gap With Anthropic
What: The Decoder reports Google DeepMind has stood up a strike team led by Sebastian Borgeaud (formerly Gemini pre-training) focused on long-horizon coding tasks. Sergey Brin’s internal memo calls “turning our models into primary developers” the final sprint, and Google is tracking team-level usage of its internal coding tool “Jetski”—similar to Meta’s token leaderboard. Training runs on Google’s proprietary codebase.
So What: Two signals for enterprise buyers. First, Google publicly concedes Anthropic is ahead on coding—which validates most engineering teams’ current experience and shortens the “we should wait and see what Google ships” conversation. Second, the internal-tool-first strategy (Jetski) is telling: frontier labs are now treating their own engineers as the leading pilot cohort, and what ships publicly lags what’s running inside. That pattern will hold across every model family.
Now What: If you’re picking a coding model or agent platform today, pick based on what works in your team’s actual workflows now, not on vendor roadmap slides. Re-evaluate quarterly—the leader-of-the-month dynamic is real, and Google catching up is now the explicit goal. For teams running on Gemini, ask your account team directly what Jetski’s usage looks like and when those capabilities ship externally.
Moonshot’s Kimi K2.6 Puts an Open-Source Model at the Frontier—For Long-Horizon Coding
What: Moonshot released Kimi K2.6, an open-weights coding model benchmarking neck-and-neck with GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on agentic and coding tasks. Vercel reports 50%+ gains on their Next.js benchmark. Demonstration runs include a 12-hour, 4,000-tool-call Zig inference optimization and a 13-hour autonomous rewrite of an 8-year-old matching engine (185% throughput gains). Agent Swarm now scales to 300 sub-agents across 4,000 coordinated steps.
So What: This is the first time open weights sit inside the frontier envelope for long-horizon agent work. The implications go beyond price. Open weights mean you can host the model inside your own compliance boundary, run it offline in regulated environments, fine-tune on proprietary code without sending it to a vendor, and avoid per-token pricing on the workloads that burn the most budget. The benchmarks are vendor-run—take them with salt—but the customer quotes from Vercel, Fireworks, Baseten, Ollama, and others converge on one point: long-horizon reliability is now real on open weights.
Now What: If you operate in a regulated environment or have workloads where data can’t leave your perimeter, re-open the build-versus-buy conversation on agent workloads. The calculus from a year ago—frontier models are only available as closed API products—is no longer true. Pilot K2.6 alongside your existing closed-model stack on one high-value, long-horizon workflow and compare on reliability, cost, and governance posture.
OpenAI Ships GPT-Rosalind—A Frontier Model for Life Sciences, With Named Pharma Launch Partners
What: OpenAI launched GPT-Rosalind, a frontier reasoning model for biology, drug discovery, and translational medicine, available in research preview through ChatGPT, Codex, and the API via a “trusted access program.” Launch customers include Amgen, Moderna, the Allen Institute, and Thermo Fisher. OpenAI is framing capabilities as muted today—synthesis, experimentation planning, research compilation—with autonomous scientific progress “several technical milestones away.”
So What: This is the first vertical frontier model shipped by either major lab. OpenAI is betting the next phase of enterprise AI is specialized models with curated tool access, not general-purpose models doing everything. Life sciences is the first domain because the economics are obvious and the customer list was ready—expect similar vertical frontier launches in legal, finance, and clinical care over the next year. Notably absent from the launch customer list: payers, providers, and any non-pharma healthcare organization.
Now What: If you’re in pharma, biotech, or translational medicine, ask OpenAI directly about the trusted access program—the published customer list tells you exactly who’s in the room. If you’re in adjacent regulated industries (healthcare payer/provider, legal, financial services), watch the trusted-access pattern carefully: this is likely the GTM template for every vertical frontier model that follows, and getting in early matters more than the model’s current capability ceiling.
The Enterprise Realities
The same week three vendors reframed the workspace layer, three stories from the field reframed how you should actually buy and build. Proprietary formats are becoming liabilities as AI-native tools route around them. SpaceX on Cursor puts a reference customer on the table that answers the hardest security objection in any AI coding tool RFP. And a clean Tensorzero analysis shows that most enterprise AI budgets are built on list-price comparisons that are off by 2-5x. Your AI cost, tool choice, and vendor audit all need a refresh this quarter.
Anthropic Ships Claude Design—And Figma’s Locked Format Has an Agentic-Era Problem
What: Anthropic launched Claude Design as part of Claude Labs—a generative design workflow that takes prompts to production-quality UI and interactive prototypes without leaving Claude. A widely-shared analysis from Sam Henri argues Figma’s largely-undocumented, hard-to-work-with-programmatically file format accidentally excluded Figma from the training data that would make it relevant in the agentic era.
So What: The pattern matters beyond design. Every proprietary file format that’s hard to parse programmatically is now at risk of being routed around by AI-native tooling. Claude Design didn’t beat Figma on features—it made Figma’s closed format a liability instead of a moat. The same dynamic will play out for any vendor whose lock-in depends on an opaque format: BIM, CAD, proprietary PM tools, specialized ERP schemas. Open or interoperable formats gain value; closed formats become tech debt.
Now What: If you maintain internal tools or vendor contracts that depend on a closed format, audit them. Ask whether the format is machine-readable, whether it’s documented, whether an AI agent could roundtrip through it. If the answer is no, start planning the migration now—not because AI replaces the tool tomorrow, but because the tool’s value compounds against you every quarter the agent layer gets better.
SpaceX Picks Cursor—Enterprise IDE Adoption at Scale
What: The New York Times reports SpaceX standardized on Cursor for engineering. Details on team size and license counts aren’t public, but SpaceX is one of the largest and most security-conscious software engineering organizations in the world, and the pick validates Cursor as an enterprise-grade tool rather than a startup productivity play.
So What: This is the most significant enterprise reference for any AI coding tool to date. SpaceX’s security posture, classification requirements, and engineering culture make it an unusually strict buyer—the fact that Cursor cleared the bar tells you that enterprise-ready features (SSO, audit logs, IP protection, custom model routing, offline modes) have caught up to what large orgs need. Expect this reference to show up in every AI coding tool RFP this quarter.
Now What: If you have engineers evaluating AI coding tools, the SpaceX reference gives your security team an answer to the hardest objection: “no one at our scale runs this yet.” That’s no longer true. If you’re at the enterprise buyer stage, ask each candidate vendor what their largest production customer looks like, what SOC 2 Type II evidence they can share, and what their model-routing and IP-protection story is. The answers have gotten meaningfully better in the last 90 days.
Stop Comparing Price Per Million Tokens—Tokenization Can Make Claude 5x More Expensive Than the List Price Suggests
What: A Tensorzero analysis shows that because different models tokenize text differently, real-world cost can diverge sharply from list price. On some workloads, Claude tokens end up costing 5x more than GPT tokens despite Claude’s list price being only 2x. The gap is driven by how each tokenizer splits text—code, structured data, and non-English content all produce different token counts per byte.
So What: Most AI budgets in enterprise are built on list-price comparisons that are off by 2–5x. That’s not a rounding error—it’s the difference between a model being affordable at scale and being cost-prohibitive. The broader point is that the economics of AI workloads aren’t legible from vendor pricing pages alone. Real cost depends on your actual text, your actual prompts, and your actual workflows—and it requires instrumentation to see.
Now What: Before your next model-selection decision, run a representative 100-prompt sample through each candidate vendor, count tokens on both the input and output sides, and multiply by each vendor’s list price. Do this for every workload shape (code, structured data, long documents, conversational). You’ll almost certainly find that the “cheaper” model on the sticker is not the cheaper model in practice. Also: this is the single strongest argument for model-routing architecture—the right model for the workload beats the cheapest model by list price, every time.



