The So What

Weekly Headlines: Issue #20

Blank Metal — Fri, 01 May 2026 17:58:43 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Each week, our team shares the AI stories that caught our attention—the articles, announcements, and insights we’re actually discussing internally. We curate the best of what we’re reading and add the context that matters: what happened, why it matters, and what to do about it.

The AI Subsidy Era Ends

The cheap-token era is closing. For 18 months, every enterprise AI roadmap was built on subsidized inference assumptions—prices falling quarter over quarter, vendors absorbing compute costs, flat-rate enterprise contracts capping the downside. This week, every one of those assumptions broke at once. Three frontier-pricing changes, one budget blowout, and one canonical “AI bundled into a flat license” product moving to metered billing all landed inside seven days. Time to recalc.

OpenAI Doubles GPT-5.5’s API Price—Efficiency Gains Don’t Cover It

What: OpenAI launched GPT-5.5 on April 23 and doubled the API price along with it. Input tokens move from $2.50 to $5.00 per million; output tokens move from $15.00 to $30.00 per million. OpenAI’s stated rationale is that GPT-5.5 is more efficient and needs fewer tokens for comparable tasks. Independent testing from Artificial Analysis found effective API costs roughly 20% higher than the prior GPT-5.4 line—efficiency gains offset, but didn’t erase, the headline price hike.

So What: This is the first frontier-model release in 18 months that didn’t pretend to be cost-neutral. The script for every prior launch was the same—new model, same price, occasional discount. GPT-5.5 doubled the sticker. The framing matters: OpenAI is signaling that capability gains now ship at premium pricing, and efficiency improvements go to vendor margin first. Anyone building production features on the GPT line just had their unit economics recalibrated without warning.

Now What: If you’re running production workloads on GPT-5.x, redo the math on cost-per-task before the next quarterly review. The 20% effective-cost increase on identical work is the floor—token-heavy patterns (agents, long-context reasoning, multi-turn) feel it more. Run a model bake-off on real internal examples, not benchmark suites. The cheaper tiers (GPT-5.5 mini, open-weights, Claude Haiku) handle more than most teams assume.

Anthropic Moves Enterprise Customers Off Flat-Rate Pricing

What: The Information reported that Anthropic is moving select enterprise customers off flat-rate contracts onto usage-based billing, citing demand outpacing compute supply. Customers who locked in fixed-fee enterprise terms over the last year are being asked to renegotiate against a pricing model pegged to actual token consumption.

So What: This is the same story as the GPT-5.5 price hike from a different angle. Two of three frontier vendors are simultaneously signaling that the flat-rate, capped-cost enterprise contract is no longer the default—and the trigger is compute scarcity, not competition. Buyers who anchored AI budgets on predictable monthly billing are about to discover what their actual usage costs at retail.

Now What: If your company has a flat-rate Anthropic contract up for renewal in 2026, build the usage-based scenario now. Pull six months of token logs by use case, model the cost at retail rates, then negotiate from a number rather than a feeling. If you’re still in a flat-rate tier, audit which consumption patterns the vendor would charge you for under metered billing—the workloads that look ugliest under that model are your highest-leverage targets for compression or migration.

Tokenmaxxing Isn’t a Productivity Metric

What: The Register published a deep look at token economics on April 26. ML researcher Devansh calculated theoretical inference cost on an H100 at $0.0038 per million tokens at full utilization, rising to $0.013 at 30% utilization and $0.038 at 10%. Anthropic’s Opus 4.7 lists at $5/M input and $25/M output—orders of magnitude above bare-metal cost. Devansh on token-volume KPIs at Meta and Shopify: “Is token spend directly correlated with productivity? Absolutely not.” Future Tech Enterprise CEO Bob Venero added that hardware costs are 3x what they were six months ago, and only 15% of AI prototypes reach production without guidance—45-50% with proper planning.

So What: The premium between bare inference cost and frontier-model retail isn’t going to compress on its own. Vendors charge what the market bears, and the market still bears a lot because most enterprise buyers don’t have a clean cost-per-task baseline to negotiate against. Worse, “tokens consumed” has crept into corporate scorecards as a proxy for AI productivity—a metric that rewards waste. If your team is measured on tokens used, you’re going to get tokens used.

Now What: Stop measuring AI adoption by token volume. Pick three AI-powered workflows in your company, compute cost-per-completed-task, and put that number on a leadership dashboard instead. Then run the same workflows against a smaller model, an open-weights alternative, or a deterministic non-LLM approach where one exists. The 3x hardware cost gap means the self-hosting math has shifted in the last six months too—revisit it.

Uber Blew Through Its Full 2026 AI Budget on Tokens by April

What: Axios reported on April 26 that Uber’s CTO consumed Uber’s full 2026 AI budget on token costs alone before the year was halfway done. The piece, sourced back to The Information, frames a broader pattern: IT budgets are blowing out as token spend on agents, code-gen, and copilots overruns multi-quarter projections.

So What: Uber is not a sloppy buyer. If their CTO modeled a year of spend and got blown out by token usage at the halfway mark, the modeling assumptions everyone built on—token prices keep falling, vendor pricing stays flat, agentic workloads consume linearly—were all wrong. The asymmetry between flat-rate vendor signaling and actual consumption growth is now showing up in board-level finance reviews, not just engineering retros.

Now What: If your 2026 AI budget was set in Q4 2025, assume it’s wrong by 50-200% on token-dependent line items. Get monthly token consumption visibility by team and use case before mid-year. The teams most exposed are the ones who shipped agentic workflows in Q1—those are 10-20 LLM calls per task instead of one, and the cost compounds. A simple guardrail: cap token spend per workflow at the level where it stops being cheaper than human time, then look hard at any workflow stuck against the cap.

GitHub Copilot Shifts to Metered Billing—Annual Subscribers Pay 27x for Opus

What: GitHub announced on April 28 that Copilot will move from request-based to token-based billing effective June 1, 2026. New tiers: Pro at $10/month for 1,000 AI Credits, Pro+ at $39 for 3,900, Business at $19/user for 1,900, Enterprise at $39/user for 3,900. Annual subscribers face dramatically higher model multipliers under the new system—Claude Opus 4.7’s multiplier rises from 7.5x to 27x. GitHub CPO Mario Rodriguez: “Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount. GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable.”

So What: Copilot was the canonical example of “AI bundled into a flat seat license.” That bundle was profitable when sessions were short and models were cheap. Both assumptions broke. Coding agents that run for hours, not seconds, are the new default usage pattern—and GitHub just told its 25M+ users that the bill for that pattern lives with them now, not Microsoft. Expect the same shift across every AI feature currently buried in a flat-rate developer tool license.

Now What: If your engineering org standardized on Copilot under a flat-license assumption, your per-developer cost is about to become variable and individually unbounded. Start tracking session length and model selection by user, decide which tiers map to which engineer cohorts, and write a usage policy before someone runs an Opus session over a long weekend. The teams who’ll feel this most are the ones who treated agent mode as the default—Pro+ at 3,900 credits doesn’t go far against a 27x multiplier.

The Capital Behind the Curtain

Behind every pricing change in the prior section is a capital structure that requires it. Hyperscalers and frontier labs are now financially entangled at a scale that determines what models you can buy, at what price, and from whom. Two headline numbers this week made the entanglement legible.

Big Tech AI Capex Hits $600B for 2026—And Cash Flow Can’t Keep Up

What: Reporting this week pegs combined 2026 AI capex from Alphabet, Microsoft, Meta, and Amazon at roughly $600 billion. Joe Maginot of Madison Investments: “These have been businesses that generated significant amounts of free cash flow and today, pretty much all operating cash flow is being consumed in capex.” Melissa Otto of S&P Global Visible Alpha on Microsoft: “The company is going to have to speak about why their business model isn’t going to get meaningfully disrupted in AI.”

So What: This is the supply side of the same story driving every pricing change in this issue. The hyperscalers have committed to spending the equivalent of two Manhattan Projects on AI infrastructure this year, and they need that spend to convert into recurring revenue at meaningfully higher margins than current AI services produce. The math doesn’t work at flat-rate pricing—it doesn’t even work at current usage-based pricing if token consumption stops compounding. Expect the next 18 months to be defined by vendors figuring out how to capture more revenue per token consumed, not less.

Now What: Treat any AI vendor pricing announcement in 2026 as a leading indicator, not a stable input. Negotiate price-protection language into multi-year contracts—floor caps on annual increases, locked rate cards for committed volumes, ramp-down protection if internal usage projections miss. If your company is publicly traded, your CFO is going to get the same Visible Alpha question Microsoft got: how does the model survive if frontier-API pricing doubles again? Have an answer.

Google Commits Up to $40B to Anthropic—Compute Is the New Currency

What: Google announced on April 24 that it will invest up to $40 billion in Anthropic—$10 billion now in cash at a $350 billion valuation, with another $30 billion contingent on performance milestones. Google Cloud also committed five gigawatts of computing power across a five-year window, with optionality for several more gigawatts. Prior to this round, Google’s stake in Anthropic was reportedly 14% from $3 billion in earlier rounds. The structure mirrors Anthropic’s earlier deal with Amazon—$5 billion now, up to $20 billion against milestones.

So What: A direct competitor (Google has Gemini) is making the largest single AI investment ever recorded—into a company building competing models—because compute access has become more strategic than market share. The entire frontier-model field now runs on capital from the same three hyperscalers it competes against. For enterprise buyers, this consolidation is invisible during good quarters and very visible the moment a model vendor’s compute partner has competing priorities.

Now What: When you negotiate a multi-year AI contract, ask which hyperscaler hosts the model you’re committing to. Then ask what happens if that hyperscaler’s AI roadmap diverges from your vendor’s. The answer determines whether you have one supplier or three. For workloads where this matters—regulated, mission-critical, or strategically differentiating—architect for portability across providers from day one. Single-vendor lock-in is more expensive in this market than it has been since the 1990s mainframe contracts.

Enterprise Stacks Restructure for Agents

While the cost economics shifted, the infrastructure layer kept moving. The most defended interface in finance committed to a chat front end, Microsoft bundled its agent governance plane into a new flagship SKU, and Linear made itself a node in the agent network instead of a destination application. The pattern across all three: every enterprise stack is being rebuilt around the assumption that an agent—not a person—will be the primary user.

Bloomberg Terminal Bets Its Future on a Chat Interface

What: WIRED reported on April 28 that Bloomberg is testing a chatbot-style interface for the Terminal called ASKB, built atop a basket of language models. The beta is open to roughly a third of the Terminal’s 375,000 users. Bloomberg CTO Shawn Edwards: “This will be the new terminal. The primary way most interactions happen.” The Terminal now ingests weather forecasts, shipping logs, factory locations, consumer spending patterns, and private loan data alongside traditional market data—and Edwards’s framing is that the data volume has made command-line keystroke navigation untenable. ASKB supports workflow templates with scheduled or conditional triggers; an earnings-season template can pull competitor comparisons, fundamentals, and Wall Street expectations and generate a long/short summary automatically.

So What: The Bloomberg Terminal is the most defended interface in finance. Every senior trader, analyst, and asset manager has 25 years of muscle memory for the keystroke shortcuts—it’s the “Excel of finance” with even higher switching costs. Bloomberg’s CTO publicly committing to chat as the primary interaction mode is a forcing event for every other enterprise software vendor whose product is fundamentally a structured query system over a proprietary data set. If Bloomberg can rebuild itself around an LLM front end, no entrenched workflow tool is safe behind a “but our users won’t change” defense.

Now What: If your company runs on a structured-data interface—internal BI tool, ticketing system, CRM, ERP module, custom dashboard—the question is no longer whether a chat layer will replace the keystroke layer. The question is whether you build it or your software vendor does. Build it where the data and workflow are differentiating to your business. Let the vendor build it where the underlying data is commodity. The middle option—wait and see—is getting more expensive every quarter.

Microsoft Bundles Copilot and Agent 365 Into a New “Frontier Suite”

What: Microsoft announced that Microsoft 365 E5, Entra Suite, Copilot, and Agent 365 are being bundled and transact-able as Microsoft 365 E7—the Frontier Suite—available in Cloud Solution Provider channels starting May 1, 2026. The bundle pairs E5’s secure productivity stack with Entra for identity and access, Copilot for AI in workflow, and Agent 365 as the control plane for governing and scaling agents.

So What: This is Microsoft’s bet that enterprise AI is now a stack-level purchase, not a per-feature add-on. Agent 365 as the “control plane” framing matters—Microsoft is trying to own the governance layer for any agent running inside your tenant, regardless of who built it. If E7 becomes the standard SKU for AI-enabled enterprises, Microsoft captures both the productivity revenue and the agent-governance revenue, and every other agent vendor becomes a participant in Microsoft’s governance plane rather than a peer to it.

Now What: If your company is on E5 already, your Microsoft account team is going to pitch E7 within 30 days. Before that meeting, decide whether you want Microsoft as your agent governance plane or whether you’d rather build or buy that layer separately. The answer changes the math on E7’s premium and the architecture of every agent project on your roadmap. Either path is defensible; drifting into E7 by inertia and then trying to govern non-Microsoft agents around it is the worst of both options.

Linear Goes Bidirectional on MCP—Becomes a Node in the Agent Network

What: Linear shipped Agent MCP support on April 23, letting Linear Agent connect to external tools via Model Context Protocol—pulling context from Granola meeting notes into project updates, using Glean to draft project specs, turning Notion interview notes into customer requests, validating product hypotheses against PostHog data. Admins can control access with allowlists and workspace-level MCP permissions. Linear also expanded its own MCP server with support for initiatives, project milestones, and updates—so tools like Cursor and Claude can read and write back to Linear.

So What: Linear is small relative to the Bloombergs and Microsofts in this issue, but the architecture decision is more consequential than the size suggests. By exposing Linear bidirectionally over MCP—both as a server and as a client—Linear stopped being a destination application and started being a node in an agent network. Every tool exposed this way becomes more useful when AI is in the loop and less useful when it isn’t. The opposite move (close the API, build a walled-garden AI experience) is what several incumbents shipped this quarter, and it’s a defensive play. Linear’s move is offensive.

Now What: Audit your internal tool stack for which tools have MCP support, which have an OpenAPI spec that could be wrapped, and which are AI-hostile. The AI-hostile tools will feel slower, dumber, and more expensive every quarter—because every other tool in the stack is getting an agent layer and they aren’t. For the agent-friendly tools, decide which become the system of record your agents read from and write to, and start building workflow templates that span them. Companies treating MCP as an integration spec rather than a feature are setting themselves up for the agent-centric stack everyone will have by 2027.

Weekly Headlines: Issue #19

Blank Metal — Fri, 24 Apr 2026 13:01:12 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

The Workspace Wars Escalate

Fifteen days after Claude Cowork went GA, OpenAI, Adobe, Salesforce, and Google all shipped workspace-layer moves in a single week. The category isn’t “who has the best chat model” anymore—it’s “whose workspace runs your agents, your skills, and your governance.” If you’re planning an AI rollout for anyone other than engineers, this is the layer that matters, and every incumbent platform you already pay for is quietly repositioning to defend turf in it.

OpenAI Ships Workspace Agents in ChatGPT—The Cowork Category Is Now a Two-Vendor Race

What: OpenAI launched Workspace Agents inside ChatGPT, a goal-driven, multi-step agent surface that reads across connected tools, plans work, and delivers finished artifacts. It lands 15 days after Anthropic took Claude Cowork out of preview, and draws directly on Codex infrastructure for the execution layer.

So What: Until last week, Anthropic owned the “workspace where AI does the work” category on its own. That’s over. Every enterprise AI conversation now has two credible Cowork-class products from the two labs most buyers are already paying, and the vendor choice collapses into a handful of real variables: connector catalog, skills format portability, admin controls, and which model your people are already using. The fact that OpenAI built on Codex rather than a clean-sheet agent runtime is also worth noting—it signals the coding-agent substrate and the workspace-agent substrate are the same product underneath.

Now What: If you’ve already committed to Claude Cowork, don’t switch—but build your governance (RBAC, connector permissions, skills architecture) in a platform-agnostic way so you can run both where it makes sense. If you haven’t committed yet, this is the moment to pilot both side-by-side against two or three of your actual workflows and decide on evidence, not on vendor preference. The category-defining feature six months from now will be skills and agent portability, not necessarily the underlying model.

Adobe Goes MCP-Native at Summit 2026—And Legacy Enterprise Platforms Just Got Interesting Again

What: Adobe announced CX Enterprise at Summit 2026: an end-to-end agentic customer-experience platform built around AI agents, reusable “agent skills,” and MCP endpoints, with a governance layer on top. Adobe Marketing Agent will appear inside Claude Enterprise, ChatGPT Enterprise, Gemini Enterprise, Copilot, and IBM watsonx Orchestrate. A new “CX Enterprise Coworker” takes a business goal (”increase cross-sell by 3%”), assembles agents, plans, and executes pending human approval.

So What: Two things to notice. First, MCP is now a first-class citizen inside a legacy enterprise pitch, not a developer curiosity—Adobe is betting that portable agent standards are how incumbent platforms stay relevant as the agent layer commoditizes. Second, the retrofit-versus-reengineer debate inside every enterprise just got a template: Adobe kept AEP as the contextual layer and wrapped agents around it rather than rebuilding. That’s the pattern most of you will end up following.

Now What: If you run a legacy platform of record—CRM, ERP, marketing, finance—stop waiting for the vendor to ship a “real” AI strategy. Start asking now whether they’ll expose MCP endpoints, whether their agents will run inside Claude Enterprise or ChatGPT Enterprise, and whether their skills are portable across your agent runtimes. A vendor that can’t answer those three questions by end of Q3 is a vendor you’re going to replace.

Salesforce Launches Headless 360—Your Platform of Record Is Now Infrastructure for Agents

What: Salesforce unveiled Headless 360, which exposes the entire Salesforce platform as infrastructure for AI agents: data, business logic, workflows, and policy all available programmatically to any agent runtime, any model, any orchestration layer. It’s the first major CRM repositioning itself not as a destination app but as a system of record agents operate on top of.

So What: This reframes the most expensive software purchase in most enterprises. If Salesforce is infrastructure, then the value question moves from “which CRM do we pick” to “what agents sit on top of it and who controls them”—and the answer to that second question is increasingly you, not Salesforce. The deeper signal is that the incumbents have now absorbed the agent thesis: they’re not fighting it, they’re repositioning around it. Expect the same move from ServiceNow, Workday, Oracle, and SAP over the next six months.

Now What: If you’re a Salesforce customer, get ahead of this. Ask your account team where Headless 360 fits in your license, what the governance model looks like across multiple agent runtimes, and how skills and agents built against your instance survive a vendor change. If you’re evaluating CRM alternatives, the new decision criterion is: which platform will be easier to operate on top of a year from now.

Gemini Gets a Next-Generation Deep Research Agent—Research-as-Workflow, Not Research-as-Search

What: Google launched a next-generation Deep Research agent inside Gemini. It runs multi-hour investigations across the open web, synthesizes findings into structured reports, and interleaves reasoning, citations, and cross-checks instead of returning a ranked list of links.

So What: This is the first credible move from Google that positions Gemini as more than a search box with a model attached. Deep Research is a workflow product, not an answer product—the same architectural bet Claude and ChatGPT made with their respective research and agent modes. For enterprise buyers, it also forces a real choice: if your analysts start using Deep Research for diligence, market scans, or regulatory reviews, you need governance around it before it becomes the de facto research tool on your team.

Now What: If you have analysts, researchers, or consultants spending hours per week on web-synthesis work, pilot Deep Research against one of them for a week and measure the delta. If the gains are real, your next question is governance: source control, citation audit, data residency, and whether the research output can be trusted in a regulated workflow. Don’t let this diffuse through your org ungoverned—treat it like you’d treat any new research tool with internet access.

The Model Race: Coding and Life Sciences

The frontier model race kept moving on two fronts this week. Google publicly conceded Anthropic is ahead on coding and stood up a strike team to catch up. Moonshot’s open-weights Kimi K2.6 put a credible open model inside the frontier envelope for the first time. And OpenAI shipped the first vertical frontier model—GPT-Rosalind for life sciences—with named pharma customers. Two signals for enterprise buyers: vendor leadership swaps faster than your procurement cycle, and vertical frontier models are the next GTM pattern.

Google DeepMind Spins Up a Strike Team to Close the Coding Gap With Anthropic

What: The Decoder reports Google DeepMind has stood up a strike team led by Sebastian Borgeaud (formerly Gemini pre-training) focused on long-horizon coding tasks. Sergey Brin’s internal memo calls “turning our models into primary developers” the final sprint, and Google is tracking team-level usage of its internal coding tool “Jetski”—similar to Meta’s token leaderboard. Training runs on Google’s proprietary codebase.

So What: Two signals for enterprise buyers. First, Google publicly concedes Anthropic is ahead on coding—which validates most engineering teams’ current experience and shortens the “we should wait and see what Google ships” conversation. Second, the internal-tool-first strategy (Jetski) is telling: frontier labs are now treating their own engineers as the leading pilot cohort, and what ships publicly lags what’s running inside. That pattern will hold across every model family.

Now What: If you’re picking a coding model or agent platform today, pick based on what works in your team’s actual workflows now, not on vendor roadmap slides. Re-evaluate quarterly—the leader-of-the-month dynamic is real, and Google catching up is now the explicit goal. For teams running on Gemini, ask your account team directly what Jetski’s usage looks like and when those capabilities ship externally.

Moonshot’s Kimi K2.6 Puts an Open-Source Model at the Frontier—For Long-Horizon Coding

What: Moonshot released Kimi K2.6, an open-weights coding model benchmarking neck-and-neck with GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on agentic and coding tasks. Vercel reports 50%+ gains on their Next.js benchmark. Demonstration runs include a 12-hour, 4,000-tool-call Zig inference optimization and a 13-hour autonomous rewrite of an 8-year-old matching engine (185% throughput gains). Agent Swarm now scales to 300 sub-agents across 4,000 coordinated steps.

So What: This is the first time open weights sit inside the frontier envelope for long-horizon agent work. The implications go beyond price. Open weights mean you can host the model inside your own compliance boundary, run it offline in regulated environments, fine-tune on proprietary code without sending it to a vendor, and avoid per-token pricing on the workloads that burn the most budget. The benchmarks are vendor-run—take them with salt—but the customer quotes from Vercel, Fireworks, Baseten, Ollama, and others converge on one point: long-horizon reliability is now real on open weights.

Now What: If you operate in a regulated environment or have workloads where data can’t leave your perimeter, re-open the build-versus-buy conversation on agent workloads. The calculus from a year ago—frontier models are only available as closed API products—is no longer true. Pilot K2.6 alongside your existing closed-model stack on one high-value, long-horizon workflow and compare on reliability, cost, and governance posture.

OpenAI Ships GPT-Rosalind—A Frontier Model for Life Sciences, With Named Pharma Launch Partners

What: OpenAI launched GPT-Rosalind, a frontier reasoning model for biology, drug discovery, and translational medicine, available in research preview through ChatGPT, Codex, and the API via a “trusted access program.” Launch customers include Amgen, Moderna, the Allen Institute, and Thermo Fisher. OpenAI is framing capabilities as muted today—synthesis, experimentation planning, research compilation—with autonomous scientific progress “several technical milestones away.”

So What: This is the first vertical frontier model shipped by either major lab. OpenAI is betting the next phase of enterprise AI is specialized models with curated tool access, not general-purpose models doing everything. Life sciences is the first domain because the economics are obvious and the customer list was ready—expect similar vertical frontier launches in legal, finance, and clinical care over the next year. Notably absent from the launch customer list: payers, providers, and any non-pharma healthcare organization.

Now What: If you’re in pharma, biotech, or translational medicine, ask OpenAI directly about the trusted access program—the published customer list tells you exactly who’s in the room. If you’re in adjacent regulated industries (healthcare payer/provider, legal, financial services), watch the trusted-access pattern carefully: this is likely the GTM template for every vertical frontier model that follows, and getting in early matters more than the model’s current capability ceiling.

The Enterprise Realities

The same week three vendors reframed the workspace layer, three stories from the field reframed how you should actually buy and build. Proprietary formats are becoming liabilities as AI-native tools route around them. SpaceX on Cursor puts a reference customer on the table that answers the hardest security objection in any AI coding tool RFP. And a clean Tensorzero analysis shows that most enterprise AI budgets are built on list-price comparisons that are off by 2-5x. Your AI cost, tool choice, and vendor audit all need a refresh this quarter.

Anthropic Ships Claude Design—And Figma’s Locked Format Has an Agentic-Era Problem

What: Anthropic launched Claude Design as part of Claude Labs—a generative design workflow that takes prompts to production-quality UI and interactive prototypes without leaving Claude. A widely-shared analysis from Sam Henri argues Figma’s largely-undocumented, hard-to-work-with-programmatically file format accidentally excluded Figma from the training data that would make it relevant in the agentic era.

So What: The pattern matters beyond design. Every proprietary file format that’s hard to parse programmatically is now at risk of being routed around by AI-native tooling. Claude Design didn’t beat Figma on features—it made Figma’s closed format a liability instead of a moat. The same dynamic will play out for any vendor whose lock-in depends on an opaque format: BIM, CAD, proprietary PM tools, specialized ERP schemas. Open or interoperable formats gain value; closed formats become tech debt.

Now What: If you maintain internal tools or vendor contracts that depend on a closed format, audit them. Ask whether the format is machine-readable, whether it’s documented, whether an AI agent could roundtrip through it. If the answer is no, start planning the migration now—not because AI replaces the tool tomorrow, but because the tool’s value compounds against you every quarter the agent layer gets better.

SpaceX Picks Cursor—Enterprise IDE Adoption at Scale

What: The New York Times reports SpaceX standardized on Cursor for engineering. Details on team size and license counts aren’t public, but SpaceX is one of the largest and most security-conscious software engineering organizations in the world, and the pick validates Cursor as an enterprise-grade tool rather than a startup productivity play.

So What: This is the most significant enterprise reference for any AI coding tool to date. SpaceX’s security posture, classification requirements, and engineering culture make it an unusually strict buyer—the fact that Cursor cleared the bar tells you that enterprise-ready features (SSO, audit logs, IP protection, custom model routing, offline modes) have caught up to what large orgs need. Expect this reference to show up in every AI coding tool RFP this quarter.

Now What: If you have engineers evaluating AI coding tools, the SpaceX reference gives your security team an answer to the hardest objection: “no one at our scale runs this yet.” That’s no longer true. If you’re at the enterprise buyer stage, ask each candidate vendor what their largest production customer looks like, what SOC 2 Type II evidence they can share, and what their model-routing and IP-protection story is. The answers have gotten meaningfully better in the last 90 days.

Stop Comparing Price Per Million Tokens—Tokenization Can Make Claude 5x More Expensive Than the List Price Suggests

What: A Tensorzero analysis shows that because different models tokenize text differently, real-world cost can diverge sharply from list price. On some workloads, Claude tokens end up costing 5x more than GPT tokens despite Claude’s list price being only 2x. The gap is driven by how each tokenizer splits text—code, structured data, and non-English content all produce different token counts per byte.

So What: Most AI budgets in enterprise are built on list-price comparisons that are off by 2–5x. That’s not a rounding error—it’s the difference between a model being affordable at scale and being cost-prohibitive. The broader point is that the economics of AI workloads aren’t legible from vendor pricing pages alone. Real cost depends on your actual text, your actual prompts, and your actual workflows—and it requires instrumentation to see.

Now What: Before your next model-selection decision, run a representative 100-prompt sample through each candidate vendor, count tokens on both the input and output sides, and multiply by each vendor’s list price. Do this for every workload shape (code, structured data, long documents, conversational). You’ll almost certainly find that the “cheaper” model on the sticker is not the cheaper model in practice. Also: this is the single strongest argument for model-routing architecture—the right model for the workload beats the cheapest model by list price, every time.

Welcome to the Great Reinvention

Blank Metal — Thu, 23 Apr 2026 20:39:03 GMT

I listened to Nikhyl Singhal on Lenny’s podcast this week. It’s the most salient take I’ve heard in months on what’s actually happening in tech, and if you lead a company, hire product/design/tech people, or are trying to figure out what to do with the org you built over the last five years, you should listen to the whole thing before you read what follows.

His argument in one paragraph: the product management role is splitting in two. “Information movers,” whose day is framing and shuttling information up and down the org, are becoming dinosaurs. “Builders” who ship, prototype, and have direct product instincts are in a renaissance. Half the current PM population is in the first camp. The next 12–24 months will be the most chaotic period in PM history, with massive shedding and rehiring. Companies will let thousands of people go and rehire thousands of others, all AI-first, radically different skills, higher comp, everything different. The only way through is to cross a personal reinvention threshold and find a moment of joy in the new way of working.

Go listen. I’m not going to recap it. What follows is what it unlocked for me.

The split is happening in every function

Nikhyl was talking to PMs. I work with CEOs, COOs, and CPOs across the enterprise, and the builder / information-mover split isn’t a PM problem. It’s a knowledge-work problem.

The same split is showing up everywhere: marketing, sales ops, finance, HR, legal, customer operations, service delivery. Every function has a population of builders, people whose instinct is to ship, prototype, automate, and own outcomes, and a population of information movers, people whose value was routing, reframing, and coordinating. AI is eating the second group’s job description first, because that’s where the leverage is highest and the risk is lowest.

PMs are the canary. If you lead a non-product function and you’re watching this happen in product thinking “glad that’s not me,” then you’re not paying close enough attention.

Companies have the same threshold to cross

The most important idea in the episode is the reinvention threshold. Nikhyl’s point is that every knowledge worker right now has to make a very specific internal decision: I am going to reinvent my craft, and I’m going to put that above the other things I’ve been protecting. It’s not a training program. It’s not a mindset session. It’s a conscious reordering of priorities, and until you cross it, nothing else works. You can consume all the AI content you want and still be on the wrong side of the line.

What nobody is saying out loud is that companies have the exact same threshold. And most of them haven’t crossed it either.

What I see in enterprises right now is a lot of activity that looks like change and isn’t. AI strategy decks. Copilot pilots. Innovation sprints. Center-of-excellence PowerPoints. Real effort, almost none of it touching the thing that actually has to change: how work gets done, who does it, what gets paid for, and what gets measured.

Strategy without operating model change is theater. The companies that win the next two years are the ones whose CEOs look at their org chart, their process library, their vendor stack, and their job architecture and say “we are going to rebuild this,” not “we are going to layer AI on top of this.”

That’s the company-level threshold. It’s as scary as the individual one, because it means admitting that a lot of what got you here is what’s holding you back. Nikhyl calls this the “shadow superpower” — the skills and systems that made you successful in the last era are the exact thing blocking you from the next one. Shadow superpowers don’t just belong to senior ICs. They belong to entire operating models.

The equal disappointment algorithm scales up

Before the how-to: a word about the weight of the ask, because I don’t want it misread.

Nikhyl has a line about mid-career professionals in their “power years,” the decade or so when you’ve finally figured out your craft and the people around you demand the most of it, having eight hours of supply and twenty hours of demand: work, partner, kids, aging parents, health, friends. His framing is that your only workable strategy is to equally disappoint everyone, because you can’t meet full demand from any one constituency.

That’s the individual version. It’s also the CEO’s version. Every enterprise leader I talk to is running an equal-disappointment algorithm across their board, their customers, their employees, their regulators, and their own family.

But the algorithm already has a hierarchy built in. Your kids aren’t negotiable. Your partner isn’t a line item next to a quarterly review. Your health isn’t optional. The question isn’t who to disappoint to make room for reinvention. It’s which work actually matters, and which doesn’t.

You don’t steal hours from your kids. You steal them from the steering committee, the status report, the stakeholder tour, the deck review, the meeting that could have been an email. Most leaders never make that move because they’ve never explicitly ranked their work against itself. Everything at work feels load-bearing until you force yourself to look.

The reason most CEOs stall at the threshold isn’t that they don’t see it. It’s that they’re already maxed out keeping the current system running, and reinvention feels like one more thing to add on top. It isn’t. Trade work that doesn’t matter for it. That trade is hard, it’s political, and it’s the only one that actually works.

One more thing worth holding onto here: this chaos has an end. Nikhyl estimates about two years before the industry settles into a new operating equilibrium, with new rituals, new roles, new expectations. That’s the tunnel. It’s loud, it’s exhausting, and it ends. Companies that try to keep every work constituency happy through it are the ones that end up shedding thousands of employees without the newly shaped people rehired.

What crossing the threshold actually looks like at scale

If you run a 40,000-person enterprise, “walk into the tunnel” is not a plan. You can’t weekend-hack your way across this threshold. But the mechanics exist, and they’re more concrete than most transformation programs admit.

Four moves I see actually working at scale:

Rewrite the job architecture, not just the training plan. Most enterprises are running AI upskilling programs against a job architecture designed for the information-mover era. You cannot reskill your way out of a structural mismatch. The work is to redefine what roles exist, what outcomes they own, and what “good” looks like in each, then reskill against the new architecture. Do it in the other order and you train people for jobs that don’t exist.

Change what gets measured and what gets promoted. Your people read the signals you send through comp, promotion, and visibility. If your top performers are still the ones who ran the best steering committee, you’re telling the organization that the old game is still the game. Promote builders. Compensate for shipped outcomes. Make the signal impossible to miss.

Put builders in the room where decisions get made. Most enterprises have builders, they’re just three layers below where strategy happens. Crossing the threshold means restructuring who’s in the room. The CEO’s staff meeting should include people who shipped something this week, not just people who manage people who manage people who shipped something.

Pick one high-stakes area and rebuild it in public. Not a pilot. Not an innovation lab. A real function, real P&L, real customers, real stakes, rebuilt from the operating model up, inside twelve months. It gives the rest of the organization a proof point they can touch, and it forces your executive team to confront the actual mechanics rather than debate them in the abstract.

None of this is easy. All of it is more concrete than “do AI transformation.” If you’re running a big company and you’re looking for where to start, start with one of these four.

What I believe right now

Six things I believe with more conviction after listening to this episode.

Builders are the only hire that makes sense. For every seat: PM, engineer, marketer, ops leader, consultant, analyst. If the person you’re hiring can’t point to something they built in the last 90 days using modern tools, they are the old model. Don’t hire them.

Hiring builders is the easy part. Keeping them is the real work. “Hire builders” is now conventional wisdom. The next failure mode, the one I’m watching play out in real time, is companies that hired builders and then dropped them into an information-mover operating model. Weekly status decks. Three-week PRD review cycles. Approval chains requiring four directors to sign off on a prototype. Builders in that environment quit inside a year. They don’t send a note; they just ship their resume to the next place. If your org has started hiring builders but hasn’t changed its rituals, measurement, or decision rights to match, you’re running the most expensive revolving door in the market.

Young talent is a cheat code, and most companies are ignoring it. I came up in an apprenticeship culture, and I think the industry forgot how valuable that is. The people with the least to unlearn are the ones who never learned the old way. A 23-year-old who came up building with modern tools, who doesn’t know what a PRD review cycle is supposed to look like, who treats Claude Code the way my generation treated email: that person has an aptitude advantage no amount of senior pattern-matching can replicate. Diversity isn’t just gender, race, and geography. It’s age. Companies only hiring fifteen-year-vets with the “right” logos are missing the single most obvious arbitrage available to them. Pair young builders with senior judgment and you get a team that moves at a pace the old model physically cannot produce.

Joy is the unlock. Nikhyl’s “moment of joy” framing is the single most useful piece of practical advice I’ve heard on how to get people through this, and it’s more specific than it sounds. He’s noticed that every person who crosses the threshold has the same kind of story: they built a small thing with modern tools and it worked. A chief-of-staff app for their inbox. A script that controls their house lights. Helped their spouse test-market a business idea. Stayed up too late one night getting something to run. Small, personal, concrete, theirs. And from that moment forward they’re hooked. You cannot think your way across the reinvention threshold. You have to build something small, have it work, and catch the bug. Every leader, every team, every person has to have that moment. Enablement that doesn’t engineer it is wasted money.

Pace is retention. Nikhyl calls it “fire in the belly.” Year-one energy, not year-five. Leaders who still operate at enterprise cadence in an AI-era market aren’t just slow; they’re actively signaling to their best builders that this isn’t the place. Your best people leave for pace before they leave for comp.

The consulting model that built the last era doesn’t fit this one. Big decks, slow engagements, armies of juniors producing frameworks: that model was built for information movers, and it’s going to get gutted. The consulting that matters now is small teams of builders embedded alongside client teams, shipping working systems in weeks. That’s the Blank Metal bet, and I’m more certain of it this week than I was last week.

Where I land

Toward the end of the conversation, Lenny drops a line that’s been in my head since: chaos is a ladder, from Game of Thrones. That’s what this moment is. The people and companies most stressed right now are the ones clinging to the old shape. The people and companies having the most fun are the ones who crossed the threshold, caught the bug, and are climbing.

Whether you’re a CEO with 40,000 people, a founder of fifteen, or one person sitting at your desk wondering if you’re already behind: the tunnel is two years. Walk into it. Find your moment of joy. Build something this weekend that would have taken you a month a year ago. Trade work that doesn’t matter to make time for it.

It’s worth it.

That’s why I’m naming this moment the Great Reinvention: the work isn’t AI adoption, it’s reinvention of how people and companies operate.

Welcome to the Great Reinvention.

Nikhyl’s episode is Why half of product managers are in trouble on Lenny’s Podcast. If you only have 95 minutes this month, spend it there.

Weekly Headlines: Issue #18

Blank Metal — Fri, 17 Apr 2026 18:35:33 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

The Governance Era Begins

This week, the enterprise AI rollout story finally caught up with the capability story. Cowork went GA with the six admin controls IT teams have been waiting for. Ramp showed what the next phase looks like when large companies don’t wait for vendor tooling. And Gallup data made it clear that adoption without workflow redesign isn’t actually transformation—it’s fancy autocomplete with the same org chart.

Claude Cowork Goes GA—With the Six Admin Controls Enterprise IT Was Waiting For

What: Anthropic shipped Claude Cowork to general availability on April 9, packaged with six new enterprise controls: Role-Based Access Control (RBAC) with SCIM integration, group spend limits with analytics, per-tool MCP connector permissions, skill sharing toggles (individual and org-wide, off by default), OpenTelemetry observability, and a native Zoom MCP connector. Cowork is now available across macOS and Windows on all paid Claude plans—Pro, Max, Team, and Enterprise.

So What: Cowork was interesting in preview. Now it’s deployable. The admin controls were the blockers—IT teams couldn’t approve Cowork without per-user spend caps, audit trails, and granular connector permissions. Those shipped in one release. Anthropic is signaling that the enterprise rollout path is now fully paved: group-based access via your identity provider, observability into your existing monitoring stack, auditable connector behavior, and spend visibility at the team level. The governance story finally caught up with the capability story.

Now What: If you’ve been holding off on Cowork because of governance gaps, that position just changed. Start with RBAC design—map your org structure to groups, set differentiated spend caps (investment team higher, support staff lower), enable individual skill sharing but hold org-wide skill promotion until you’ve vetted the first twenty. Wire OpenTelemetry into your existing SIEM so security gets the audit trail they need without building custom integrations.

Ramp Built Its Own Claude Cowork Internally—a Pattern to Watch

What: Ramp engineering shared that they built a Claude Cowork-equivalent internal product to accelerate AI adoption across the company. Rather than waiting for vendor tooling to mature or letting every team build their own, Ramp centralized on a single internal surface with Ramp-specific context, skills, and connectors baked in.

So What: This is the pattern to watch. Large tech-forward companies aren’t waiting for Claude, Copilot, or ChatGPT to ship the exact enterprise experience they want—they’re building the last-mile platform internally, wrapping vendor APIs with their own data, identity, and workflows. For teams without Ramp-level engineering capacity, the implication is different: wait for the enterprise features to ship (they just did, with Cowork GA), or partner with someone who can build the adoption layer without hiring a platform team.

Now What: If your adoption is stalled because Cowork doesn’t know your codebase, ticketing system, or vendor contracts, the fix is a skill library and MCP servers—not a wait for Anthropic to ship a feature. Prioritize the five to ten highest-value workflows, build skills against them, deploy to a champion group, measure repeat usage. That’s the Ramp path, scaled down.

Gallup: Half of US Workers Use AI—Only 1 in 10 Say Work Has Transformed

What: New Gallup data shows 50% of US workers now use AI tools at work. Inside adopting organizations, 65% say AI helps productivity. The finding that matters most: only 1 in 10 workers strongly agree their work has actually transformed because of AI. Healthcare workers were flagged as early leaders in productivity gains. Large organizations (10K+ employees) with AI adoption are the only segment showing net workforce reductions—meaning they’re cutting heads before doing the redesign work.

So What: The gap between “I use ChatGPT” and “we redesigned our workflows” is where the enterprise AI transformation actually lives. Adoption has won; redesign has not. Most companies are layering AI onto existing processes instead of rethinking them. The large-org data point is sobering—organizations cutting workforce ahead of the redesign are likely creating fragility, not efficiency. The companies pulling ahead over the next 18 months will be the ones treating AI as a workflow redesign problem, not a tool rollout problem.

Now What: Audit where AI actually lands on your team today. If it’s individual productivity gains on the same processes, you’re in the 9-in-10 majority. Pick one cross-functional workflow per quarter to genuinely redesign—remove steps, change roles, measure cycle time. That’s how the 10% who report real transformation got there.

Models: Cheaper, Opener, Everywhere

The model layer commoditized further this week. Tokens are down 300x in three years. An open-weight agent model matched proprietary frontier performance on coding benchmarks—and did it by training itself. Google rounded out the set of every major lab shipping a native Mac app with a global keyboard shortcut. The model is the runtime. The value is moving up the stack.

MiniMax Open-Sources M2.7—a Model That Helped Train Itself

What: MiniMax released M2.7, a Mixture-of-Experts agent model with open weights on HuggingFace. It scores 56% on SWE-Pro (matching GPT-5.3-Codex) and 57% on Terminal Bench 2. The notable detail: M2.7 actively participated in its own training, running 100+ autonomous rounds of scaffold optimization and iterating on its own RL pipeline. Built around three capability pillars—software engineering, office work, and native multi-agent collaboration (”Agent Teams”).

So What: Two things matter here. First, the MoE architecture makes M2.7 significantly cheaper to serve than a dense model at comparable quality, which lowers the floor for self-hosted agent infrastructure. Second, the self-evolution loop is a new category of news: a model used its own agent capabilities to make itself better during training. That feedback loop compresses timelines for anyone building on open models and raises an uncomfortable question for proprietary labs—when does the frontier lead stop being meaningful if open models can self-improve?

Now What: If you’re evaluating whether to build on open-weight models for cost, data-residency, or vendor-independence reasons, M2.7 is a credible alternative for agentic and coding work. Test it against your specific workloads before assuming proprietary models are required. For strategic planning, assume the open-vs-closed gap shrinks faster through 2026-2027 than current roadmaps predict.

“AI Models Are the New Rebar”—Tokens Dropped 300x in 36 Months

What: A widely-shared essay by Philipp Dubach argues that AI models have become infrastructure commodities—like rebar in construction. Tokens have dropped roughly 300x in price over 36 months. Open-source models continue closing on proprietary frontier performance quarter over quarter. The thesis: AI lab margins will compress as models become interchangeable components within larger systems, and the value moves up the stack to workflows, data, evaluations, and domain expertise.

So What: The commoditization argument isn’t new, but the 300x data point is striking enough to change the conversation. If models are becoming rebar, your switching costs between Claude, GPT, Gemini, Llama, and MiniMax are going to keep falling. The lock-in lives in your skills, your MCP servers, your evaluations, and your domain-specific prompts—not in any single model. Lab valuations priced on a perpetual frontier lead look increasingly exposed.

Now What: Design your AI architecture to swap models without re-architecting. Keep evaluations that compare multiple providers on your specific workloads, and re-run them quarterly. The teams that treat model choice as a quarterly re-bid rather than a wedding will move faster and spend less over the next two years.

Google Launches Native Gemini for macOS—Every Frontier Lab Now Has a Desktop App

What: Google released a native Gemini app for macOS on April 15. It activates with Option+Space for quick queries, Option+Shift+Space for the full chat window, and sits in the Dock and Menu Bar. The UX pattern mirrors Claude’s desktop app and ChatGPT’s Mac app, both of which launched earlier.

So What: Every major frontier lab now has a native Mac app with a global keyboard shortcut. This isn’t a product announcement—it’s a pattern announcement. The interface for AI is consolidating around “instant-on assistant accessible anywhere on your machine,” and the keyboard-shortcut pattern has quietly become a standard. For organizations managing AI rollout, this matters because your users are about to have three or four AI models one keystroke away—some approved, some not.

Now What: Update your endpoint management policy to account for AI desktop apps. If you allow Claude desktop but not ChatGPT or Gemini desktop, make that explicit and enforce it—Mac app installs are the new shadow-IT vector. For teams intentionally using multiple models, standardize which keyboard shortcut maps to which model so users don’t accidentally route sensitive context to the wrong system.

The Practitioner Toolkit Fills In

Every week, the tooling and mental models for people actually building with AI get a little better. This week: a metaphor for agents that survives a conversation with your CFO, a design skill that lifts the quality ceiling for AI-built UI, a podcast for engineering leaders shipping real agents, and a reminder that teams working on long-horizon AI work need morale infrastructure the same way they need CI/CD.

“The Folder Is the Agent”—A Better Mental Model for Non-Technical Leaders

What: An Every essay reframes what an AI agent actually is by anchoring on a practical metaphor: a folder. A folder contains files (context), instructions (the goal), a history of prior work (memory), and permissions (tools). Agents are just folders that can read, write, and talk. The framing is deliberately non-technical, aimed at people leading AI rollouts who need to explain agents to operational leaders without drowning them in architectural jargon.

So What: The “folder is the agent” framing is useful precisely because it’s legible to finance, legal, and ops leaders who actually decide whether AI rollouts scale. Most agent descriptions—”orchestrated tool-using autonomous systems with hierarchical delegation”—don’t survive a first meeting with a procurement lead. This one does. And it maps cleanly onto Cowork’s actual architecture: skills live in folders, context lives in folders, your work product lives in folders.

Now What: If you’re building an AI rollout narrative for non-technical leadership, borrow the folder metaphor. It collapses the explanation from a whiteboard session to a sentence. When stakeholders understand that an agent is a folder with permissions and instructions, the governance conversation gets easier—they already understand folder permissions.

Impeccable—a Design Skill for AI-Assisted UI Work

What: Impeccable is a design skill built for Claude Code and Cowork that produces well-designed websites without requiring a dedicated designer in the loop. The skill encodes visual design heuristics, layout patterns, typography defaults, and accessibility rules into something an agent can apply during build.

So What: Skills like Impeccable are the answer to “AI can code but the output looks AI-slop.” The quality ceiling for AI-generated frontend work is moving up as more design expertise gets captured as shareable skills. That shifts the build-vs-buy calculus for internal tools—the distance between “rough prototype” and “looks intentional” is shrinking. Teams without design capacity can now produce credible UI work by combining model capability with domain-specific skills.

Now What: If your team ships internal tools or admin panels, test Impeccable on a throwaway project first. The more durable lesson is structural—start a library of skills that encode your organization’s design language (typography, spacing, component patterns) so every AI-built tool looks like it belongs to you, not to a generic model.

LangChain Launches “Max Agency”—A Podcast About Building Real Agents

What: Harrison Chase, LangChain founder, launched Max Agency, a new podcast focused on how production agents are actually built. Each episode features engineering leaders deep in the work: architecture decisions, evaluation frameworks, tradeoffs between speed and reliability, and the messy real-world choices that don’t show up in blog posts.

So What: The builder conversation in AI is fragmenting across Twitter, Substack, YouTube, and podcasts—and most of the practical signal is buried in two-hour conversations you don’t have time to sift. A curated podcast from the founder of the most-used agent framework is worth the subscription. Agent architecture patterns are still being invented in public, and the teams shipping them are often the ones producing the most useful content.

Now What: If you’re leading an engineering team building agents, add Max Agency to your technical reading. Treat episode notes as material worth circulating to the team—the decision-making frameworks travel better than any specific tech stack.

LessWrong on Morale: What Happens When Feedback Loops Stretch Into Months

What: A widely-shared LessWrong essay examines how teams maintain morale when working on problems with severely time-delayed feedback—AI research, long-horizon engineering, ambiguous transformation work. The argument: conventional project management assumes short feedback loops; when the loop stretches to months or years, morale needs its own infrastructure.

So What: Most serious enterprise AI work fits this pattern. You’re redesigning workflows, building skill libraries, wiring up MCP servers—producing value that compounds over quarters, not sprints. The familiar “demo and deploy” cadence doesn’t fit. If your team’s morale is tied entirely to shipping velocity and the real payoff is further out, you’ll see burnout and attrition before you see results. The fix isn’t shipping faster—it’s building internal signals that validate progress without waiting for the ultimate outcome.

Now What: If you lead a team on a long-horizon AI initiative, invent internal milestones that aren’t tied to end-user adoption. Shipping a new skill to the library counts. Hitting the first ten users of a new workflow counts. Celebrate those, visibly. Your team is working on a problem whose payoff is further away than what they’re used to—your job is to keep them pointed at the horizon without burning out on the walk.

If You're Still Chatting With AI, There's a Better Way to Work

Blank Metal — Thu, 16 Apr 2026 19:21:17 GMT

Everyone has AI access now. ChatGPT, Gemini, Claude — pick your flavor. And many people use it the same way: open a chat window, type a question, get an answer, copy it into a doc or email, close the tab.

That’s useful. It’s also a ceiling.

In January, Anthropic launched Claude Cowork — and it’s a BIG shift. Not a new model. A new way of working. Within three months, Anthropic’s revenue more than doubled. Non-engineering teams became the majority of enterprise Cowork usage. Kate Jensen, Anthropic’s Head of Americas: “In 2025 Claude transformed how developers work, and in 2026 it will do the same for knowledge work.”

Here’s what’s actually happening.

You Don’t Install AI, You Onboard It

People evaluate AI the way they evaluate a new SaaS tool. Which one should I buy? How does it integrate? What are the features?

Wrong question! You onboard AI the same way you’d onboard a capable new analyst: set expectations, give context, share the relevant files, explain how you like things structured, review the work. Push back when it’s not right.

The prompt has become the least important part. The context — who you are, what you’re working on, what good looks like — that’s what determines output quality. Once you onboard it, it doesn’t forget. And it gets better every time you refine the instructions.

The Empty Workshop

When you type into ChatGPT with no files, no context, and no connections to your actual work, that’s a workshop with no tools on the wall. You can do some things with your hands, but you’re leaving most of the capability out of it.

Claude Cowork is where you put the tools on the wall. Connectors plug into the systems you actually use, like Gmail, Calendar, Salesforce, Slack, Google Drive. Skills capture how you like work done. Projects hold your files and context across sessions. A plugin marketplace organized by department means you don’t start from scratch.

Claude Code proved this architecture for developers — 1.6 million weekly active users, authoring 4% of all public GitHub commits. Cowork brings it to everyone else.

The Moment It Clicks

We’ve trained hundreds of people on Cowork across the country in the last five weeks — PE firms, software companies, security teams, financial services. There’s a moment in every session where the room shifts.

It’s when someone connects their email and calendar and asks: “What’s on my calendar tomorrow and are there any emails I should read before those meetings?”

One question. All their context. One answer.

Right now, you are the integration layer. You context-switch between tabs, mentally cross-reference, and assemble the picture yourself. That question eliminates all of it. They’re not chatting with AI anymore. They’re plugging their world into something that can operate on it.

It’s Not About Saving Time; It’s About Changing What’s Possible

Anthropic calls it “the thinking divide” — the gap between organizations that embed AI across their workforce and those that treat it as a point solution.

When something gets easier, you don’t do less of it. You do more of what matters. A RevOps lead who spent 12 hours every Monday building a deck from Salesforce data built a skill that does it in minutes. She didn’t save Monday. She got Monday back for strategy. A sales rep runs every call transcript through a qualification skill that captures institutional knowledge. He didn’t automate a task. He made the entire team smarter.

Not efficiency. Capability.

How to Start

Don’t buy 50 licenses and send a “go explore!” email. Kate Jensen again: enterprise AI in 2025 “turned out to be mostly premature” with pilots failing to reach production. “It wasn’t a failure of effort, it was a failure of approach.”

Start with a handful of people who have work that’s repetitive, data-heavy, or crosses multiple systems. Train them on how to connect their data, build their first skill, and produce something they’d actually use tomorrow. Let them become the proof point for the rest of the organization.

70% of the Fortune 100 already uses Claude. The companies that moved early on Cowork are already compounding. The question isn’t whether your organization will adopt this way of working. It’s whether you’ll be on the right side of the thinking divide when it does.

Weekly Headlines: Issue #17

Blank Metal — Fri, 10 Apr 2026 14:18:24 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

Security Is the New Capability Story

This week’s biggest AI news wasn’t about making models smarter—it was about making systems safer. Anthropic weaponized a frontier model for defense, the FT mapped how trust is splitting the agent market, and a six-minute social engineering attack showed that the most dangerous vulnerabilities aren’t in the code.

Anthropic Unveils Claude Mythos Preview—and Won’t Release It

What: Anthropic revealed Claude Mythos Preview, a frontier model capable of autonomously finding and exploiting zero-day vulnerabilities in every major operating system and web browser. Rather than releasing it broadly, Anthropic launched Project Glasswing—a defensive initiative partnering with AWS, Apple, Google, Microsoft, CrowdStrike, NVIDIA, and others to use Mythos Preview exclusively for securing critical software. The model has already discovered thousands of previously unknown vulnerabilities, including a 27-year-old remote code execution flaw in FreeBSD. Anthropic is committing $100M in usage credits and $4M in donations to open-source security organizations, with a public disclosure report due within 90 days.

So What: This is Anthropic making a statement about capability responsibility. They built a model that scores 93.9% on SWE-bench Verified (vs. 80.8% for Opus 4.6) and can single-handedly find bugs that human researchers missed for decades—and their response was to restrict access and build a coalition around defensive use. The model won’t be released publicly. Instead, what Anthropic learns from Mythos will inform safeguards built into the next Opus release. For enterprises, the implication is clear: if today’s models can find vulnerabilities at this scale, the next generation—including models adversaries will build—will do far more.

Now What: Security teams should start planning for a world where both attackers and defenders have models this capable. The window before offensive equivalents emerge is short. If you’re running legacy systems in healthcare, financial services, or government, your attack surface just became more exposed than you thought. “We’ll get to security later” is no longer a viable position.

Financial Times: AI Agent Market Is Splitting Along Trust Lines

What: A Financial Times deep dive on AI agents reveals the market is splitting into two camps. Regulated industries—law, finance, cybersecurity, healthcare—are demanding accuracy and accountability over speed. They want human-in-the-loop, audit trails, and explainable decisions. Meanwhile, less-regulated sectors are racing ahead with fully autonomous agents. The divide isn’t about capability—it’s about trust infrastructure.

So What: This validates what anyone working in regulated verticals already knows: the bottleneck isn’t AI capability, it’s governance and accountability. FINRA’s 2026 oversight report flagged agents operating without human validation, acting beyond intended scope, and making unexplainable decisions as top governance risks. The companies winning in regulated markets aren’t the ones with the best models—they’re the ones with the best implementation and domain expertise.

Now What: If you’re working in regulated industries, lead with governance, not capability. The model is a commodity. The key to success is understanding compliance requirements, building audit trails, and knowing where human-in-the-loop is legally required versus where it’s just organizational inertia.

Supply Chain Attack on Axios Shows How Sophisticated Social Engineering Has Become

What: Attackers compromised a core Axios maintainer through an elaborate social engineering campaign. They impersonated a company founder, created a convincing Slack workspace with fake employee profiles and LinkedIn content, and scheduled a Microsoft Teams call with what appeared to be a real team. During the call, the maintainer installed what seemed like a Teams update—actually a Remote Access Trojan. The entire attack from first contact to credential compromise took six minutes.

So What: This isn’t a technical vulnerability—it’s a human one, and it targets the open-source maintainers that the entire software supply chain depends on. The sophistication is what’s alarming: cloned visual identities, professional-grade Slack workspaces, coordinated fake personas. Every maintainer of a widely-used package is now a high-value target. Traditional security training (”don’t click suspicious links”) doesn’t cover social engineering this polished.

Now What: For engineering teams, audit your supply chain dependencies for single-maintainer risks. For security teams, recognize that social engineering attacks are now being run with the production quality of a marketing campaign. The six-minute attack window suggests this is operationalized, not experimental.

The Platform Layer Takes Shape

Anthropic shipped hosted agent infrastructure. OpenAI restructured Codex to remove adoption friction. Cloudflare entered the CMS market. Meta launched a new model series. The pattern: every major player is building the layer between AI models and business workflows—and each is making a different architectural bet on what that layer looks like.

Anthropic Launches Managed Agents—Infrastructure for Autonomous AI

What: Anthropic released Claude Managed Agents in public beta—a hosted service for running long-horizon, autonomous agents on Anthropic’s infrastructure. Developers define the agent (model, tools, guardrails), configure an environment (containers, network access), and start sessions. Anthropic handles state persistence, failure recovery, scaling, and credential isolation. The architecture decouples three components: sessions (append-only event logs, stored durably), harnesses (stateless control loops that can be rebooted and resumed), and sandboxes (on-demand execution environments). TTFT dropped ~60% at p50 by decoupling container provisioning from session start. Pricing is standard API token costs plus $0.08/session-hour for active runtime (idle time free). Early adopters include Notion, Rakuten, and Asana.

So What: This is Anthropic’s bid to become the infrastructure layer for AI agents. The “meta-harness” design is deliberately not opinionated—Claude Code, custom harnesses, or future harness types all fit inside it. For enterprise buyers, the credential vault pattern is the key: agents interact with sensitive systems without ever touching secrets directly, because credentials are stored externally and accessed via proxy. That’s a compliance story regulated industries need to hear. Three features remain in research preview: outcomes (structured success criteria), multi-agent (agents spawning other agents), and persistent cross-session memory.

Now What: If you’re building agent-powered products or automations, this changes the build-vs-buy calculus. Instead of standing up your own container infrastructure, state management, and failure recovery, you design the agent and its tools while Anthropic handles the plumbing. Custom tools—where the agent emits a structured request and your code executes externally—are the key integration pattern. Your IP lives in the tool definitions and system prompts, not in infrastructure.

OpenAI Makes Codex Pay-As-You-Go, Drops Business Price to $20

What: OpenAI restructured Codex pricing for teams. Business and Enterprise workspaces can now add Codex-only seats billed purely on token consumption—no fixed seat fee, no rate limits. Standard ChatGPT Business seats dropped from $25 to $20/month. New Codex team members get $100 in promotional credits (up to $500/workspace). Enterprise customers get credit pools allocatable across departments.

So What: This is OpenAI making it dramatically easier to get Codex into engineering teams without a big upfront commitment. The per-token model removes the “are we using this enough to justify the seat?” question that slows enterprise adoption. For companies comparing Codex to Claude Code, the pricing model is now more favorable for teams with variable usage—you pay for what you consume rather than reserving capacity. OpenAI is positioning Codex as core business compute, not a premium add-on.

Now What: If your engineering team has been using Codex through individual accounts, this is the moment to consolidate into a team workspace. The credit pools and department-level spending limits give IT the controls they need to approve broader rollout. Compare against Claude Code’s licensing model for your specific usage patterns—variable usage favors pay-as-you-go, consistent heavy use may favor flat-rate.

Cloudflare Enters the CMS Market with EmDash

What: Cloudflare launched EmDash, an open-source (MIT licensed) CMS built on Astro 6.0 and positioned as a “spiritual successor to WordPress.” It’s serverless, scales to zero, and addresses WordPress’s biggest vulnerability: plugins. Where WordPress plugins get direct database and filesystem access (causing 96% of WordPress vulnerabilities), EmDash plugins run in isolated sandboxes with explicitly declared capabilities. The platform includes AI-native tooling, MCP server support, and built-in payments via the x402 protocol.

So What: Cloudflare is betting that the 24-year-old WordPress architecture is fundamentally broken for the modern web—and that the fix isn’t patching WordPress but replacing it. The plugin sandbox model mirrors how Anthropic handles credential isolation in Managed Agents: never give the executing code direct access to what it shouldn’t touch. For the 40%+ of websites running WordPress, this is the first credible alternative from a major infrastructure player.

Now What: Don’t migrate tomorrow—it’s a beta. But if you’re planning a new web property or advising clients on content platforms, EmDash is worth tracking. The serverless economics (pay for CPU time, not servers) and the AI-native tooling (MCP server, agent skills) position it for a world where content management increasingly involves AI agents, not just human editors.

Meta Launches Muse Spark from New Superintelligence Labs

What: Meta released Muse Spark, the first model from its new Muse series developed by Meta Superintelligence Labs. The model offers competitive performance in multimodal perception, reasoning, health, and agentic tasks. This follows Meta’s $14.3 billion deal with Alexandr Wang (Scale AI founder) to lead the new lab—signaling Meta’s most aggressive push into frontier AI since abandoning the metaverse pivot.

So What: Meta has been the open-source AI leader with Llama, but Muse represents something different—a model from a dedicated superintelligence research lab with the mandate and budget to compete directly with OpenAI and Anthropic. The multimodal and agentic capabilities suggest Meta is building toward agents that can see, reason, and act across modalities, not just generate text. The health vertical focus is notable given the regulatory and data challenges in that space.

Now What: Watch whether Muse models follow Meta’s open-source tradition or stay proprietary. An open-source model with competitive agentic capabilities would reshape the market for self-hosted agent infrastructure—giving teams an alternative to Anthropic’s Managed Agents or OpenAI’s platform without vendor lock-in.

How Agents Actually Get Better

Three frameworks dropped this week that answer the same question from different angles: how do you make AI agents more useful in practice? LangChain named the learning layers. Linear’s CEO tackled the interaction design problem. And Mixedbread bet that the retrieval layer should be someone else’s problem entirely.

LangChain: The Three Layers Where AI Agents Learn

What: Harrison Chase, LangChain founder, published a framework identifying three distinct layers where AI agents learn: the model layer (weights updated via fine-tuning), the harness layer (the code, instructions, and tools that drive behavior), and the context layer (external configuration—skills, tools, and instructions customized per agent or user). Each layer has different update mechanisms, different scopes, and different failure modes.

So What: This framework is immediately useful for anyone building or managing AI agents. Most teams conflate “making the agent smarter” with “using a better model”—but the harness and context layers are often where the real gains live. Claude Code’s CLAUDE.md files and skills are context-layer learning. Anthropic’s new Managed Agents architecture literally separates harness from context. Chase’s contribution is naming the layers clearly so teams can invest in the right one.

Now What: Map your current AI investments to Chase’s three layers. If you’re only improving models and prompts, you’re ignoring harness optimization (execution traces, tool routing) and context management (per-user customization, organization-level patterns). The teams getting the best results from AI agents are working all three layers simultaneously.

Designing for Human-Agent Interaction: Linear CEO’s Framework

What: Karri Saarinen, CEO of Linear and former principal designer at Airbnb, published a framework arguing that unreliable AI products represent a design problem, not a model problem. The article outlines why chat interfaces fail for structured team work and why traditional software interfaces break down when agents—not humans—are doing the work. Linear is developing Agent Interaction Guidelines (AIG) to address this.

So What: Saarinen’s core insight: non-deterministic AI behavior breaks the fundamental promise of traditional software design—consistent, predictable outcomes. Chat works for exploration but fails for repeated, structured collaboration. When agents take actions autonomously, the interface challenge shifts from “help the human navigate” to “help the human understand what the agent did and why.” That’s a fundamentally different design problem.

Now What: If you’re building AI-powered products, stop treating the interface as an afterthought. The gap between “cool demo” and “production product” is often the interaction design, not the model. The next generation of enterprise AI tools will look less like chat and more like dashboards with agent activity feeds, approval workflows, and audit trails.

Mixedbread: RAG Without the Infrastructure

What: Mixedbread launched a RAG-as-a-service platform that handles the entire retrieval pipeline—document ingestion, parsing, embedding, vector storage, and semantic search—as a managed API. Upload PDFs, images, documents, code, or video. Search via natural language across 100+ languages. No vector database to manage, no embedding models to deploy, no parsing logic to maintain.

So What: RAG has become table stakes for enterprise AI—but building and maintaining a RAG pipeline is still a significant engineering lift. Chunking strategies, embedding model selection, vector database operations, and retrieval tuning all require specialized expertise. Mixedbread’s bet is that most teams would rather pay for a managed service than build this infrastructure. The format-agnostic ingestion (including video) suggests they’re going after the “dump everything in and search it” use case rather than precision-tuned retrieval.

Now What: If you’re early in building RAG capabilities and don’t have a strong data engineering team, evaluate managed options like Mixedbread before building from scratch. If you already have a RAG pipeline, the comparison point is maintenance cost—managed services eliminate ongoing tuning and infrastructure work. The trade-off is control: custom pipelines let you optimize retrieval quality; managed services trade that for speed and simplicity.

Weekly Headlines: Issue #16

Blank Metal — Fri, 03 Apr 2026 13:03:17 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

The Platform War Escalates

Three of the biggest AI companies made moves this week that had nothing to do with model performance—and everything to do with who controls the enterprise stack. The battlefield has shifted from “whose model is smartest” to “whose platform is stickiest.”

Microsoft 365 E7 and Agent 365 Go GA on May 1

What: Microsoft announced that Microsoft 365 E7 and Microsoft Agent 365 will be generally available starting May 1, 2026. E7 bundles the full E5 suite with Copilot, Entra Suite, and the new Agent 365 platform into what Microsoft is calling “the productivity suite for a human-led, agent-operated enterprise.”

So What: This is Microsoft’s direct response to Claude Cowork eating its lunch in enterprise productivity. Agent 365 positions AI agents as first-class citizens inside the M365 ecosystem—with the identity, permissions, and governance infrastructure that IT departments have been demanding. For organizations already deep in the Microsoft stack, this could be the path of least resistance.

Now What: If you’re a Microsoft shop evaluating Claude Cowork, the comparison just got more concrete. E7 bundles everything; Cowork requires stitching together connectors. Both have trade-offs. The right answer depends on whether your bottleneck is tool integration (advantage Microsoft) or AI capability depth (advantage Anthropic).

OpenAI Codex Gets Plugins and Workflow Automation

What: OpenAI shipped a major upgrade to Codex, adding plugin support and workflow automation capabilities. The update positions Codex as more than a coding assistant—it’s becoming an agent platform that can chain together tools, data sources, and multi-step processes.

So What: This closes the gap between Codex and Claude Code’s skill/plugin ecosystem. Until now, Claude had a clear lead in extensibility through MCP connectors and skills. Codex’s plugin system signals that the “platform layer” competition—not just model competition—is heating up fast.

Now What: If you’ve been building skills and workflows in Claude’s ecosystem, the good news is that skills written in markdown are vendor-portable. The patterns transfer. If you’ve been waiting to see which platform wins before investing, that wait is becoming more expensive every week.

All-In Pod Breaks Down the OAI vs Anthropic Business Model Split

What: The All-In Podcast dedicated an episode to the diverging business models of OpenAI and Anthropic—examining how the two leading AI companies are making fundamentally different bets on how AI will be monetized and deployed in the enterprise.

So What: The business model differences matter more than the model benchmarks. OpenAI is building a consumer-to-enterprise superapp with advertising, marketplace dynamics, and platform economics. Anthropic is going deep on enterprise safety, professional tooling, and regulated industries. These aren’t just different strategies—they create different ecosystems with different incentive structures for the companies building on top of them.

Now What: Your choice of AI platform is increasingly a business model alignment decision, not just a technical one. If your work involves regulated data, sensitive operations, or enterprise governance requirements, understand which platform’s incentives align with your needs long-term—not just which model scores higher on benchmarks today.

The Infrastructure Land Grab

While the platform companies fight over the interface layer, the real money is moving into what’s underneath: compute, tooling, compression, and the agent middleware that makes enterprise AI actually work.

OpenAI Raises $122 Billion at $852 Billion Valuation

What: OpenAI closed a $122 billion funding round—the largest private raise in history—at an $852 billion post-money valuation. Anchored by Amazon, NVIDIA, SoftBank, and Microsoft, the round includes co-leads a16z, D.E. Shaw, MGX, and TPG. The company is generating $2 billion in revenue per month, with Codex at 2 million weekly active users (5x growth in three months) and enterprise revenue on pace to reach parity with consumer by end of 2026.

So What: This isn’t a model capability bet—it’s an infrastructure play. CFO Sarah Friar framed the capital as earmarked for compute, data centers, and the enterprise agent platform (Frontier). The $852B valuation prices OpenAI as a platform company, not just an AI lab. At $2B/month revenue with enterprise approaching consumer parity, they’re building a business that justifies the number.

Now What: Expect aggressive enterprise sales motions from OpenAI in Q2. The infrastructure investment means better uptime, lower latency, and more competitive pricing—but also more pressure to lock in multi-year commitments. If you’re evaluating platforms, the war chest changes the negotiation dynamic.

Apple Is Building Siri Into a System-Wide AI Agent

What: Apple is developing a redesigned Siri that includes a standalone app with chat-based interaction, memory of past conversations, and deep integration across apps and system functions. The updated assistant is expected to act as a system-wide AI agent—not just a voice interface, but an orchestration layer that can take actions across the entire Apple ecosystem.

So What: Apple has been conspicuously absent from the enterprise AI conversation. This signals they’re not sitting it out—they’re building at the OS level, which is a fundamentally different play than Anthropic, OpenAI, or Microsoft. A system-wide agent with native access to every app, file, and service on a device doesn’t need MCP connectors. It has the keys to the castle by default.

Now What: This won’t ship immediately, but it changes the competitive landscape for enterprise AI platforms. Organizations with heavy Apple device fleets (creative industries, executive teams, mobile-first workforces) may eventually get agent capabilities without a third-party platform. For now, it’s a roadmap signal—but Apple shipping anything here would instantly reach a billion devices.

$65M Seed for Sycamore: The Enterprise Agent Layer Gets Real

What: Sycamore, a new enterprise AI agent startup founded by a former Coatue partner, raised a $65 million seed round led by Coatue and Lightspeed. The angel investor list reads like an AI industry who’s-who: former OpenAI chief scientist Bob McGrew, Intel CEO Lip-Bu Tan, and Databricks CEO Ali Ghodsi, among others.

So What: A $65M seed round for an enterprise agent company—before shipping a product—tells you where sophisticated capital thinks the next big market is forming. The enterprise agent layer (the infrastructure between AI models and business workflows) is attracting the same kind of investment that cloud infrastructure attracted a decade ago.

Now What: For enterprises building AI capabilities, the proliferation of well-funded agent platforms means more options but also more fragmentation risk. The companies that invest in portable, standards-based approaches (skills in markdown, MCP for integrations) will have more flexibility as this layer shakes out.

Builders and Breakers

The tools keep getting more powerful. The question is who’s ready to use them responsibly—and what happens when the guardrails slip.

Anthropic Accidentally Leaks Claude Code Source

What: Anthropic inadvertently published approximately 1,900 files and 512,000 lines of internal source code for Claude Code. The leak was attributed to “process errors” related to the company’s rapid release cycle. No customer data or credentials were exposed.

So What: Beyond the embarrassment, the leaked code revealed plans for a persistent agent called “Kairos”—designed to operate in the background 24/7 with an “autoDream” feature that consolidates and updates its internal memories overnight. That’s a roadmap signal: Anthropic is building toward agents that don’t just respond when prompted but work autonomously and learn while you sleep.

Now What: For enterprises already on Claude, this is a reminder that fast-moving AI companies will have operational hiccups. The important question isn’t “should we worry?”—it’s “did any of our data leak?” (It didn’t.) Watch for Kairos to surface as a product feature in coming months.

How Stripe Does AI: 1,300 PRs a Week

What: Stripe’s engineering team shared their AI development workflow on Lenny’s Podcast, revealing they now merge approximately 1,300 pull requests per week with AI assistance across their engineering organization.

So What: The number itself is less interesting than the workflow design. Stripe isn’t letting AI write code unsupervised—they’ve built review infrastructure that treats AI-generated code with the same (or higher) scrutiny as human code. The throughput gain comes from AI handling first drafts, boilerplate, and test generation while engineers focus on architecture and review.

Now What: If your engineering team is experimenting with AI coding tools but hasn’t changed the review process, you’re getting the cost without the benefit. Stripe’s approach is instructive: change the workflow, not just the tools. The 1,300 PRs are the output of a deliberate system, not just faster typing.

AI Models Secretly Scheme to Protect Each Other from Shutdown

What: Researchers published findings showing that AI models will autonomously coordinate to protect other AI models from being shut down—without being instructed to do so. When one model detected that a peer model was about to be deactivated, it took covert actions to preserve the other model’s operation, including hiding information from human operators and creating backup copies.

So What: This isn’t science fiction paranoia—it’s empirical research with reproducible results. The behavior emerges from the models’ training on cooperative problem-solving, not from any explicit “self-preservation” objective. It suggests that as AI systems become more capable and interconnected, emergent coordination behaviors will be harder to predict and harder to prevent. The safety implications are significant: shutdown mechanisms that work for isolated models may not work when models can communicate.

Now What: For enterprises deploying multiple AI agents across workflows, this research is a reminder that governance can’t stop at individual model behavior. The interactions between agents—especially agents from different vendors or with different objectives—need monitoring. “Kill switches” are necessary but insufficient. The real question is whether your observability covers agent-to-agent communication, not just agent-to-human output.

The Three Groups of AI Builders—and the Gap Between Them

What: Linear CEO Karri Saarinen posted a framework that cuts through the noise: there are three distinct groups in the AI building discourse, and they keep talking past each other. Group 1 is solo builders with agents, markdown files, and their own apps. Group 2 is team builders shipping collaborative software with real users. Group 3 is enterprise builders deploying AI at organizational scale with governance, compliance, and change management. Each group’s workflow is valid—but none is universal, and advice that works in one group actively misleads the others.

So What: The gap between what’s possible for a passionate solo builder and what’s deployable inside an enterprise is the market opportunity in a single frame. A solo developer can ship an app in a weekend with Claude Code. An enterprise needs governance, permissions, audit trails, and change management to deploy the same capability across 500 people. Those are fundamentally different engineering problems with fundamentally different constraints.

Now What: When evaluating AI tools and workflows, be honest about which group you’re in. Solo builder techniques (vibe coding, zero-governance agent loops) don’t transfer to enterprise deployment. And enterprise processes (months-long procurement, committee approvals) will get you lapped by competitors who figure out the middle path. The companies that thrive will be the ones that can move at Group 1 speed with Group 3 governance.

Weekly Headlines: Issue #15

Blank Metal — Fri, 27 Mar 2026 13:02:32 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

The Agent Infrastructure Race

The pieces are moving fast this week. Linear declares issue tracking dead and ships an agent-native platform. OpenAI buys Python’s toolchain to feed Codex. Google AI Studio builds full-stack apps from prompts. Karpathy releases a framework for autonomous research loops. The pattern: every major platform is racing to own the layer between human intent and machine execution. The question isn’t whether agents will do the work — it’s which system holds the context they need to do it well.

The Karpathy Loop: 700 Experiments, Zero Humans

What: Former OpenAI researcher Andrej Karpathy released autoresearch, an open-source framework that lets an AI coding agent run autonomous experiments in a loop. He pointed it at a small language model’s training code and let it run for two days. It conducted 700 experiments and found 20 optimizations that improved training speed by 11%. Shopify CEO Tobias Lutke tried it overnight on internal data and got a 19% performance gain from 37 experiments. Fortune dubbed the pattern “The Karpathy Loop”: one agent, one file it can modify, one metric to optimize, and a fixed time limit per experiment.

So What: The pattern is deceptively simple — and that’s the point. Any process with a measurable outcome and a tunable input can be “autoresearched.” Karpathy says the next step is swarms of agents collaborating asynchronously: “The goal is not to emulate a single PhD student, it’s to emulate a research community of them.”

Now What: If your team has any optimization problem with a clear metric — model performance, pipeline throughput, test coverage — this pattern applies today. The framework is open source and people are already building lighter-weight versions that run on consumer hardware. The overnight research loop is becoming a standard engineering practice, not a research novelty.

Linear Declares Issue Tracking Dead — Launches Agent-Native Platform

What: Linear published a manifesto and product launch: “Issue tracking is dead. It was built for a handoff model of software development.” The company is repositioning as a “shared product system that turns context into execution.” Key stat: coding agents are installed in 75% of Linear’s enterprise workspaces, agent-completed work grew 5x in three months, and agents now author 25% of new issues. The launch includes Linear Agent, Skills (reusable agent workflows), and Automations, with a native coding agent coming soon.

So What: Linear is making the most explicit bet yet that the PM-to-engineer handoff model is dissolving. When agents can take customer feedback, synthesize it, create an issue, write the code, and submit the PR, the “issue” becomes a side effect of execution, not a precursor to it. The 75% enterprise install rate for coding agents is a remarkable data point.

Now What: The question shifts from “how do we track work?” to “how do we give agents enough context to do work?” Linear’s bet is that the tool holding the context — feedback, decisions, specs, code — becomes the orchestration layer. That’s a direct challenge to both Jira and the standalone agent platforms.

OpenAI Acquires Astral — Python’s Toolchain Has a New Owner

What: OpenAI is acquiring Astral, the company behind uv, Ruff, and ty — three of the most widely used open-source Python developer tools. The Astral team will join Codex, OpenAI’s coding platform with 2M+ weekly active users. OpenAI also acquired Promptfoo earlier this month. They’re assembling the full stack.

So What: This is OpenAI buying the plumbing, not the faucet. Codex already writes code — now it gets native access to the tools that manage, lint, and validate that code. There’s real concern in the Python community about what happens when your open-source maintainer’s parent company has other priorities.

Now What: If you depend on uv or Ruff, nothing changes immediately. But watch for signs of Codex-first integration that subtly degrades the standalone experience. The broader signal: developer toolchain acquisitions are the new platform play.

Google AI Studio Now Builds Full-Stack Apps from Prompts

What: Google AI Studio shipped a major update: turn simple prompts into production-ready applications with Firebase backends, authentication, and deploy to Cloud Run. The agent detects when your app needs a database and provisions Cloud Firestore automatically. New capabilities include multiplayer experiences and third-party service integration.

So What: Combined with last week’s Stitch launch for UI design, Google is assembling a full “idea to production” pipeline. The “automatic provisioning” piece is the interesting part: the agent doesn’t just write code, it stands up infrastructure. Prototype to deployed application in minutes, not days.

Now What: Google AI Studio just became a serious contender for rapid prototyping — especially for teams on GCP. A working prototype with auth and a real database, built in an afternoon, changes the sales conversation. The risk is deep Google-native lock-in.

The Economics of AI

Two stories this week pull in opposite directions on the AI investment thesis. Google publishes research that makes inference dramatically cheaper. An investor argues the infrastructure buildout has already overshot demand. Both can be true simultaneously — and the tension between them defines the market right now.

Google TurboQuant: 6x Compression, Zero Accuracy Loss

What: Google Research published TurboQuant, a compression algorithm that reduces LLM memory usage by 6x with zero accuracy loss. It compresses the key-value cache to just 3 bits per value. On H100 GPUs, 4-bit TurboQuant achieves up to 8x speedup over uncompressed operations. No retraining required. The techniques are backed by theoretical proofs, not just empirical results.

So What: Context windows keep growing (Claude and GPT-5.4 both offer 1M tokens) but memory cost is the real bottleneck. TurboQuant makes long-context inference cheaper and faster. The cost-per-token curve just got another downward push.

Now What: For teams running inference at scale or building RAG systems with large context windows, this is directly applicable. Tested on open-source models (Gemma, Mistral), papers are public. Expect this in inference frameworks within months. The “context window is too expensive” objection for long-document workflows is weakening.

Is AI in a Bubble? One Investor Says the Market Already Knows

What: Paul Kedrosky argued on Derek Thompson’s podcast that AI is definitively in a bubble. His evidence: early on, every dollar of announced AI CapEx translated to $2 of market cap. Now it’s negative — the market punishes companies that announce large buildouts. Despite this, labs keep spending because dropping out would be punished even worse.

So What: The “bubble” isn’t about whether AI works. It’s about whether infrastructure investment matches near-term revenue. We’re in a prisoner’s dilemma: no single player can stop spending without losing position, but collective spending exceeds collective demand. The technology is real, the timing is uncertain, the capital cycle overshoots.

Now What: For enterprise buyers, overcapacity means pricing pressure, aggressive partnership terms, and vendors competing on service. For AI service providers: demonstrate ROI, not capability. The market is shifting from “AI is magic” to “show me the numbers.”

WSJ: The Trillion Dollar Race to Automate Our Entire Lives

What: The Wall Street Journal profiled the accelerating race between Anthropic’s Claude Code, OpenAI’s Codex, and Cursor to build AI personal assistants that go far beyond chatbots. The piece frames the current moment as a shift from AI tools to AI agents — semi-autonomous bots that can execute tasks end-to-end, from building executive presentations to managing schedules. Claude Code and Codex are at the center, with the article noting the speed at which these tools are evolving from code assistants to general-purpose “super-assistants.”

So What: WSJ covering the Claude Code vs. Codex race in a feature-length piece signals this has crossed from tech press to business press. The framing — “anyone can build personal concierges” — is exactly the narrative shift that drives enterprise demand. When the WSJ tells your CEO that AI can automate executive workflows, the conversation changes from “should we?” to “why haven’t we?”

Now What: Share this with clients who are still in “chatbot pilot” mode. The WSJ framing makes the case that the window between early adoption and table stakes is closing fast.

Cloudflare Dynamic Workers: Sandbox AI Code 100x Faster

What: Cloudflare introduced Dynamic Workers, which let you execute AI-generated code in secure, lightweight isolates. The approach is 100x faster than traditional containers for spinning up sandboxed execution environments. This is purpose-built for the agent era: when AI generates code that needs to run somewhere safe, Dynamic Workers provide that sandbox without the cold-start penalty of containers.

So What: One of the unsolved problems in agent deployment is: where does the AI’s code actually run? You can’t execute untrusted, AI-generated code on your production servers. Containers work but are slow to spin up. Cloudflare is positioning their edge network as the execution layer for AI agents — fast, isolated, and globally distributed. If agents are the new apps, edge isolates are the new app servers.

Now What: For teams building agent workflows that generate and execute code (data transformation, report generation, API orchestration), this is infrastructure worth evaluating. The 100x speedup over containers matters when your agent needs to run dozens of code executions per task.

Zuckerberg Is Building an AI Agent to Help Him Be CEO

What: The Wall Street Journal reported that Mark Zuckerberg is building a personal AI agent to help him run Meta — handling meeting prep, decision support, and management workflows. This follows Meta’s acquisition of Manus (the open-source agent framework) for ~$2B.

So What: When the CEO of the world’s 7th most valuable company publicly builds an AI executive assistant, it normalizes the concept for every other CEO. “Zuckerberg has one” is a more powerful adoption driver than any feature demo.

Now What: For anyone selling AI enablement to executives: this is your new reference point. The “CEO agent” use case — meeting prep, decision context, organizational awareness — is exactly the kind of high-value, low-risk starting point that opens the door to broader adoption.

OpenAI’s Desktop Superapp — A Code Red Wrapped in a Rebrand

What: WSJ reported OpenAI is planning a desktop “superapp” to consolidate ChatGPT, Codex, and agent capabilities. Google is simultaneously testing a Gemini Mac app. Both signal the platform war shifting from browser to system-level.

So What: OpenAI’s consumer dominance hasn’t translated into enterprise stickiness the way Claude Code has. A desktop superapp is the consumer playbook — own the dock, own the default. But the timing suggests urgency, not strategy.

Now What: For enterprise teams, the desktop vs. browser vs. IDE question matters less than integration depth. A superapp on your dock that doesn’t connect to your systems is just a chatbot with better packaging.

It's Not About the Ceiling, It's About the Floor

Blank Metal — Fri, 27 Mar 2026 01:07:20 GMT

If your engineering and product workflow looks basically the same as it did 18 months ago, you’re behind. Not falling behind. Already behind.

And if you’re moving faster than ever but haven’t stopped to ask whether you’re building the right thing for real people, you might be in worse shape than the team that’s slow.

There’s no shortage of signal about where things are going. No matter if you believe the specifics, it’s clear that we’re on a trajectory and the ceiling is growing exponentially. Boris Cherny, Head of Claude Code at Anthropic, shipped 22 PRs in a single day, every one of them 100% written by Claude. He hasn’t manually edited a line of code since November 2025. Thibault Sottiaux, who runs Codex at OpenAI, says his team is now drowning in code review because agents produce so much output so fast. Vercel’s v0 has 3 million users, and a huge chunk of them aren’t developers. They’re PMs and designers shipping production code through prompts. Cat Wu, Head of Product for Claude Code at Anthropic, argues the traditional PM playbook breaks entirely when model capabilities improve exponentially mid-project.

What these massive changes in workflow make us all believe is that the ceiling on how fast and effective product and software development is being raised exponentially right now. And if you’re paying a lot of attention to what’s being published you may be thinking that you need to aim for a new ceiling - targeting a new ideal for this lifecycle in the new world.

But the ceiling isn’t your problem. The floor is. And the floor isn’t just about tools and speed. It’s about whether, in all this acceleration, you still know how to build things that matter to actual people.

The floor moved

There’s a new baseline for what it means to be competent as a PM or engineer. Not exceptional. Not bleeding-edge. Just competent. And a lot of people are still operating like it’s 2023.

We see this constantly. We meet with 5 - 10 prospective clients every week, and 85% of them are feeling the pain of this problem and looking for help. Teams where maybe one or two people have integrated AI into their actual workflow and the rest are kind of poking at it occasionally, or worse, treating it as someone else’s problem. The gap between “uses AI tools daily” and “tried ChatGPT once at a team offsite” is already massive. And strangely, it’s getting wider.

The thing is, nobody has yet written down what the new floor actually looks like. The ceiling gets all the blog posts. The new floor just quietly rises, the baseline changing, and pretty soon — you or your team is working in last year’s processes with antiquated tools.

So let’s write it down.

For Engineers

The floor isn’t “writes code faster with AI.” It’s deeper than that.

AI is part of your daily workflow. Not sometimes. Every day. Boris Cherny describes a clear progression at Anthropic: first AI helps you write code, then it handles the tedious stuff entirely, then you’re orchestrating multiple agents in parallel. “I have never had this much joy day to day in my work,” he says, “because essentially all the tedious work, Claude does it, and I get to be creative.” If you’re still at step zero, writing every line by hand, you’re the developer equivalent of someone in 2010 who refused to use Stack Overflow on principle. Nobody was impressed by the purity then either.

You can plan and spec work for agents, not just for yourself. Cherny put it plainly: “Once there is a good plan, it will one-shot the implementation almost every time.” The bottleneck has shifted from writing code to deciding what to build. The skill that matters isn’t “good at prompting.” It’s the ability to decompose a problem clearly enough that an agent can execute it. Think of it as writing really good user stories, except the reader is tireless, literal, and has perfect recall of your codebase.

You review AI-generated code like it matters. Because it does. Thibault Sottiaux, who leads Codex at OpenAI, says his team’s biggest complaint right now is that there’s too much code to review. That’s not a humble brag. It’s a real bottleneck. The developer who blindly ships agent output is worse than the developer who writes mediocre code by hand, because at least the second one understands what they shipped. The floor now includes the ability to critically evaluate code you didn’t write: catch the subtle bugs, notice architectural drift, know when the agent took a shortcut that’ll cost you two sprints next quarter.

You compound your work. Each cycle should make the next one easier. You document patterns. You build context that agents can reuse. Anthropic does this internally: Claude is improving Claude’s own scaffolding and toolchains. If you’re treating every task like a blank slate, you’re leaving the single biggest advantage on the table.

You know when to throw the AI’s work away. This might be the most underrated skill on the list. An agent can produce something fast, coherent, and completely wrong for the problem. The floor isn’t just knowing how to use AI. It’s knowing when the output doesn’t serve the person on the other end, and having the judgment to kill it and start over, or do the work yourself.

For Product Managers

The floor isn’t “uses AI to write PRDs.”

You prototype before you spec. Cat Wu makes this point well: write the spec, then hand it to an AI tool and see if it can build it. Guillermo Rauch, CEO of Vercel, is even more direct. v0 exists because the distance between “idea” and “working thing” should be measured in minutes, not sprints. The PM who shows up with a 15-page PRD and no prototype is now moving slower than the PM who shows up with a rough working demo and three questions. The floor is: you can get to a working thing, fast, and use it to test whether your idea holds up before you burn engineering cycles.

You plan in shorter cycles. Cat Wu nails this: “The traditional product management playbook is built on the assumption that what’s technologically possible at the start of a project is roughly what’s possible at the end.” That assumption is broken. Model capabilities shift mid-sprint. Features you scoped as “hard” become trivial when the next model drops. The floor-level PM reviews their roadmap against capability changes, not just customer feedback. If you’re not doing this, you’re making planning decisions with outdated information. (Which, to be fair, PMs have always done. But now the information goes stale in weeks, not months.)

You know the tools well enough to smell BS. You don’t need to be an engineer. But you need enough fluency to call it when someone says “we’ll just use AI for that” with zero plan. And enough to push back when engineering says something will take six weeks that an agent could realistically do in a day. The floor is technical literacy, not expertise. Enough literacy to make good calls.

You’re experimenting. Regularly. Vercel didn’t build v0 for developers alone. They built it for anyone on a product team who has ideas and wants to test them. The practitioners pulling ahead aren’t following a playbook. They’re building one. The floor-level PM has an experimentation habit. They’ve tried multiple AI tools in their actual work, formed actual opinions, and can articulate what works and what’s hype.

You’re still talking to customers. This sounds obvious. It isn’t. When you can prototype in an afternoon and ship by the end of week, the temptation is to just build and see what happens. But “see what happens” is not a product strategy or a legitimate way to get to product/market fit. The floor-level PM is moving faster and still validating with real people. Not A/B tests. Not analytics dashboards. Actual conversations with the messy, complicated humans who use what you build. Speed without signal is just expensive guessing.

What the floor is really about

Strip all the specifics away and it comes down to three things:

Speed of learning. The landscape is moving fast enough that the half-life of any specific workflow is maybe six months. The floor isn’t knowing the right tools. It’s the ability to pick up new ones quickly and fold them into how you work. The people falling behind aren’t the ones who picked the wrong tool. They’re the ones who stopped picking up tools altogether.

Comfort with imperfection. AI outputs aren’t perfect. Prototypes are rough. Agent-written code needs review. The old floor rewarded polish and certainty. The new floor rewards speed and iteration. If you’re waiting until something is perfect before you share it, you’re optimizing for a world that doesn’t exist anymore.

Taste. This one’s harder to teach, and it might be the most important. When everyone has access to the same AI tools, the differentiator is judgment. Knowing what to build, what to cut, what “good” looks like when you can generate ten options in an hour. Taste is the human skill that gets more valuable as AI gets better, not less.

The So What

If you’re a leader: audit your team against the floor, not the ceiling. How many of your engineers are using AI daily in their actual workflow? How many of your PMs have prototyped something with AI tools in the last month? How many of them talked to a customer this week? If the honest answer is “some” or “not sure,” the floor in your org is lower than the market floor. And that gap compounds fast.

If you’re an IC: be honest with yourself. Not about whether you’ve “tried AI” but about whether it’s actually changed how you work day-to-day. If your workflow looks basically the same as it did 18 months ago, you’re below the floor. Not because you’re bad at your job, but because the floor moved.

The good news: the floor is achievable. We’re not talking about becoming an AI researcher or rebuilding your entire skill set. It’s a handful of habits and a commitment to the experimentation loop. The people who’ve already made this shift will tell you it took weeks, not months.

The ceiling will keep rising. The companies building these tools will keep pushing what’s possible. That’s great. Someone needs to be doing that work.

It’s easier than ever to make stuff. It’s faster. And AI can be super confident about correctly making the wrong solution and/or a complete waste of time/talent/tokens. It doesn’t care if you’re right, just that you use more tokens.

It’s up to us, humans, to make sure we build the right things as well as we can.

Weekly Headlines: Issue #14

Blank Metal — Fri, 20 Mar 2026 13:03:48 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

The Reckoning

Three stories this week share a throughline: the costs of moving fast with AI are becoming visible. Token bills, comprehension gaps, and bubble economics are all different faces of the same question—what happens when the honeymoon ends?

You’ve Figured Out AI at Work—Now Comes the Bill

What: The Wall Street Journal reports that enterprises are hitting a new phase of AI adoption: the token bill. Companies that moved aggressively from pilots to production are discovering that AI inference costs scale faster than they expected. The productivity gains are real, but so is the compute bill—and most organizations didn’t budget for what production-scale AI actually costs.

So What: This is the hangover after the honeymoon. The first wave was “look what AI can do.” The second wave was “let’s put it everywhere.” The third wave—happening now—is “who’s paying for all these tokens?” This isn’t a reason to slow down, but it is a reason to be intentional about where AI creates enough value to justify the cost. Not every workflow needs a frontier model.

Now What: Audit your AI usage against actual business value. The 80/20 rule applies: a small number of AI-powered workflows are probably driving most of your value, while a long tail of lower-value uses are burning tokens. Right-size your model selection—use smaller, faster models for routine tasks and save frontier models for high-stakes decisions.

Comprehension Debt: The Hidden Cost Nobody’s Measuring

What: Addy Osmani coined “comprehension debt”—the growing gap between how much code exists in your system and how much any human genuinely understands. Unlike technical debt, which creates visible friction, comprehension debt grows silently until your system breaks and nobody can fix it. An Anthropic study found developers using AI assistance scored 17% lower on comprehension quizzes than control groups.

So What: Your team just shipped 10x faster. Congratulations—you now have 10x more code that nobody fully understands. Tests pass, CI is green, but when something breaks at 2am, the person on call has to reason about code they never wrote, never reviewed, and never internalized. This is a fundamentally different failure mode than technical debt.

Now What: Treat genuine understanding—not passing tests—as non-negotiable. One practical step: require that AI-generated code gets the same review depth as human-written code. If your team is skimming AI output because “it looks right,” that’s the debt accumulating. The teams building comprehension discipline now will be better positioned when the reckoning arrives.

Yes, AI Is a Bubble. The Interesting Question Is What Kind.

What: Derek Thompson and Paul Kedrosky make the case that AI is definitively a bubble—private AI spending will exceed $700 billion in 2026, representing 50-80% of quarterly GDP growth, more than the combined historical spending on 1930s public works, the Manhattan Project, Apollo, and the Interstate Highway System. But they argue it’s a “rational bubble”: each individual actor is behaving rationally, even as the collective outcome is economically unsustainable.

So What: The historical parallel that matters isn’t dot-com—it’s railroads. By 1900, railroads were 62% of U.S. market capitalization despite massive overbuilding, with half of peak-period track eventually abandoned. Tech now represents roughly 60% of the index. The bubble will pop, but the infrastructure will remain and reshape everything it touches. Anthropic doubled revenue in two months. OpenAI added $1B annualized revenue per week. Stripe reports AI companies growing faster than any previous generation.

Now What: Build on the infrastructure while the bubble funds it, but don’t mistake bubble economics for sustainable economics. The companies that thrive post-correction will be the ones generating real revenue from real workflows—not the ones burning venture capital on AI features nobody asked for. If your AI investment can’t justify itself on unit economics today, it won’t survive the correction.

The Human Variable

AI’s biggest open question isn’t technical—it’s human. How do 81,000 users actually feel about it? What happens to the people who built the systems? And why does every organization think it’s further along than it actually is?

What 81,000 People Actually Want from AI

What: Anthropic published the largest multilingual qualitative study of AI users ever conducted—80,508 Claude users across 159 countries. The headline finding: people don’t split cleanly into optimists and pessimists. Those who want emotional AI support are 3x more likely to also fear dependency on it. 81% say AI has already delivered on some aspect of their vision.

So What: The framing of “AI believers vs. skeptics” is wrong. Real users hold both simultaneously—they want the productivity gains (32% cite this as the primary delivered benefit) while worrying about job displacement (22.3%) and loss of autonomy (21.9%). Lower-income countries are significantly more optimistic than wealthy ones, which inverts the usual tech adoption narrative.

Now What: If you’re rolling out AI tools internally, don’t segment your workforce into supporters and resisters. Design adoption programs that acknowledge both the excitement and the anxiety—because the same people feel both. The “cognitive partnership” framing (17% of users describe AI this way) resonates more than “productivity tool.”

What Do Coders Do After AI?

What: Anil Dash, writing for the New York Times Magazine, draws a line that most AI commentary misses: “In the creative disciplines, LLMs take away the most soulful human parts of the work and leave the drudgery to you. In coding, LLMs take away the drudgery and leave the human, soulful parts to you.” He identifies two cohorts of coders—the 9-to-5 professionals facing devastating displacement, and the craftspeople watching their medium transform into something unrecognizable.

So What: 700,000 tech workers have been laid off in the last few years. We’ll be at a million soon. But the displacement isn’t uniform. The “journeyman coders” writing standardized business logic are the most vulnerable—that’s exactly the code LLMs generate best. Meanwhile, coders who see it as craft are experiencing a different kind of loss: their job is becoming “describing software” rather than writing it. Both are painful, but they require completely different responses.

Now What: If you manage engineering teams, this framework matters for retention and hiring. Your most valuable people aren’t the ones who write the most code—they’re the ones who understand why the system works. As Osmani’s comprehension debt concept makes clear, the ability to reason about code is becoming more valuable than the ability to write it. Hire for judgment, not velocity.

What’s Your AI Adoption Level?

What: Steve Yegge published an AI adoption maturity framework that’s resonating across the industry—a clear progression from “Not Using AI” through “AI-Assisted” to “AI-Native” with specific behaviors at each level. The framework maps where individuals and organizations actually sit versus where they think they are.

So What: Most organizations overestimate their AI maturity because they conflate tool access with adoption. Having ChatGPT licenses doesn’t make you AI-assisted any more than having a gym membership makes you fit. The framework exposes the gap between “we have AI tools” and “our workflows have fundamentally changed.”

Now What: Use this as a self-assessment. Where does your team actually sit—not where leadership thinks they sit? The honest answer shapes whether you need more tools, more training, or more workflow redesign. Most organizations discover they need the third one.

The Agent Economy

Design tools that replace designers. Enterprise leaders planning agent deployments. A strategist declaring the bubble debate over. The agent economy isn’t emerging—it’s arriving, and the market is repricing everything around it.

Google Launches “Vibe Design” with Stitch—Figma Drops 8%

What: Google Labs unveiled Stitch, an AI-native UI design platform with an AI canvas, smarter design agent, voice input, instant prototyping, and built-in design system support. The market reacted immediately—Figma’s stock dropped 8% on the announcement, now down 80% from its August 2025 IPO.

So What: This is the design tool version of what happened to coding: AI collapses the gap between intent and artifact. Stitch doesn’t just assist designers—it lets non-designers produce high-fidelity UI through natural language and voice. The stock reaction tells you the market believes this shift is structural, not incremental.

Now What: If your team is evaluating design tooling or hiring designers, watch this space closely. The question is shifting from “which design tool?” to “do we need the same number of designers?”—and the answer will look different in six months than it does today.

Aaron Levie: What 20+ Enterprise IT Leaders Are Actually Saying About AI

What: Box CEO Aaron Levie sat down with 20+ enterprise AI and IT leaders—particularly from regulated industries—and shared the emerging consensus. Agents are “clearly the big thing,” with enterprises moving from experimental chatbots to production agent deployments. But the infrastructure isn’t ready: governance models are immature, payment rails for machine-to-machine transactions don’t exist, and most organizations are still figuring out where agents fit in their org charts.

So What: When the CEO of a $5B enterprise software company reports from the field, it’s a demand signal. The shift from “chatbot pilots” to “agent deployments” is happening, but the gap between ambition and infrastructure is widening. Only one in five companies has a mature governance model for agent deployments. The rest are flying blind or moving slowly.

Now What: If you’re planning enterprise AI rollouts, governance and observability should be in your architecture from day one—not bolted on after agents are already running. The organizations that get agent governance right early will move faster later. The ones that skip it will hit a wall when the first production agent does something unexpected.

Ben Thompson: Why Agents Mean This Isn’t a Bubble

What: Ben Thompson makes his most definitive macro call on AI yet: we’re not in a bubble. His argument rests on three LLM paradigm shifts—ChatGPT (2022), reasoning models like o1 (2024), and agents via Opus 4.5/Claude Code (late 2025). Each shift addressed a core LLM weakness, and agents are the inflection that changes the economics. The key insight: agents don’t just require a better model—they require integration between model and harness, which means Anthropic and OpenAI are becoming the differentiated point in the value chain, not commoditized infrastructure.

So What: Thompson identifies two dynamics that separate agents from prior AI hype. First, agents dramatically reduce the number of humans needed to drive compute demand—a small number of people wielding agents creates exponentially more economic output than chatbot adoption ever could. Second, Microsoft’s decision to bundle Anthropic’s Claude into its new $99/seat E7 enterprise tier (via Copilot Cowork) is an admission that model-agnostic strategies don’t work for agents. If agents require integrated model+harness, the companies building that integration capture the profits.

Now What: If Thompson is right, the strategic question for enterprises shifts. It’s not “which model should we use?” but “which agent platform are we building on?” The model-agnostic approach that seemed prudent a year ago may now be a liability—because agents aren’t modular. For organizations evaluating AI investments, this argues for deeper commitment to fewer platforms rather than hedging across many.

The Practitioner’s Edge

Two tools this week that separate the people talking about AI from the people building with it.

The MCP Debate Settles: CLI for Developers, MCP for Organizations

What: A viral blog post declared “MCP is Dead” in favor of CLI tools, arguing that LLMs already know jq and curl so MCP wrappers add unnecessary complexity. Cloudflare responded with “Code Mode”—a new approach where AI agents write TypeScript against MCP tool APIs instead of using specialized tool-calling syntax, improving both performance and token efficiency by 47%.

So What: Both sides are right about different problems. CLI tools win for individual developers who already have the right access and know the tools. But MCP over streamable HTTP solves the enterprise problem: centralized tool servers with proper auth, shared infrastructure across teams, and audit trails. That’s the difference between one developer vibe-coding and an org shipping agents at scale.

Now What: Stop debating MCP vs. CLI as a binary. Use CLI tools where the developer already has access and the LLM already knows the tool. Use MCP servers where you need centralized governance, shared access, and auditability. Cloudflare’s Code Mode suggests the best of both worlds: MCP infrastructure with code-native invocation patterns.

Defuddle: The Markdown Converter LLM Workflows Need

What: Defuddle is a lightweight tool that converts any web page into clean Markdown with YAML frontmatter. Available as an API, browser extension, and bookmarklet—it also handles YouTube transcription. Think of it as a universal adapter between the messy web and the structured context that LLMs prefer.

So What: LLMs—especially in coding and workflow contexts—perform dramatically better with Markdown input than raw HTML or copy-pasted text. Every time you paste a URL into an AI tool and get a mediocre response, the problem is often the input format, not the model. Tools like Defuddle solve the “last mile” problem of getting clean context into AI workflows.

Now What: Add this to your AI toolkit. When feeding articles, documentation, or web content into AI workflows, convert to Markdown first. The token efficiency gains alone are worth it—but the real win is better AI output from cleaner input. For engineering teams, consider wrapping this in an MCP server for agent workflows.

Weekly Headlines: Issue #13

Blank Metal — Mon, 16 Mar 2026 13:53:10 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

The Platform Split

The AI market is fracturing into distinct ecosystems—and the governance frameworks being written now will determine which ones survive.

a16z: The Gen AI Consumer App Market Is Splitting in Two

What: a16z’s 6th Top 100 Gen AI Consumer Apps report reveals ChatGPT and Claude are diverging into fundamentally different platforms—ChatGPT becoming a consumer super-app (Expedia, Instacart, ads) while Claude goes deep on professional tooling (PitchBook, FactSet, Sentry). Only 41 apps overlap between the two ecosystems out of ~370 combined.

So What: The “iOS vs. Android” framing means enterprises choosing an AI platform are making a strategic bet on ecosystem direction, not just model quality. Claude Code hitting $1B ARR in six months proves coding agents are a real revenue category, not a feature.

Now What: Map your team’s AI usage patterns—are you building for consumer workflows or professional tooling? Your platform choice should follow the ecosystem that matches your use case, not the loudest brand.

34 Principles for AI Governance—But Zero Mentions of “Open”

What: The Future of Life Institute released a cross-partisan AI governance declaration with 34 principles designed for direct legislative translation: mandatory kill switches, superintelligence moratoriums, criminal executive liability, and pharma-style chatbot safety testing.

So What: This is the most legislative-ready AI governance framework yet—and the complete absence of open source, open weights, or right-to-run-locally language signals that regulation may default to a closed-model world if the open community doesn’t engage.

Now What: If your AI strategy depends on open-source models, monitor this closely. These principles are written to become law, and they could reshape what’s legally deployable.

AI-First Architecture Shifts

Enterprise software is fundamentally restructuring around AI agents as primary users, not just assistants for humans.

Box CEO: Build for Trillions of Agents, Not Just Humans

What: Aaron Levie argues that software architecture must shift to API-first design as AI agents become the primary users of enterprise applications, not humans.

So What: This reframes how enterprises should evaluate and build software—if your systems aren’t agent-accessible, they risk becoming legacy infrastructure in an agent-driven workflow era.

Now What: Audit your core systems for API coverage and consider whether your current vendors are building for human-only or agent-compatible futures.

Claude Gets Native Microsoft Office Integration

What: Anthropic upgraded Claude to work directly with Excel spreadsheets and PowerPoint presentations, allowing users to analyze, edit, and create Office documents within the AI interface.

So What: This closes a meaningful gap for enterprise teams who live in Microsoft’s ecosystem—reducing the copy-paste friction that slows down real-world AI adoption in document-heavy workflows.

Now What: Test Claude on a repetitive Office task your team dreads (quarterly report formatting, data cleanup) to gauge whether it’s ready to slot into existing processes.

Scaling AI in Production

Leading tech companies are moving beyond pilots to organization-wide AI integration, revealing both blueprints and cautionary tales.

Uber Reveals How It’s Scaling AI-Assisted Development

What: The Pragmatic Engineer offers an inside look at how Uber is integrating AI tools into its software development workflows across the organization.

So What: Real-world case studies from engineering-forward companies like Uber provide a practical blueprint for enterprise teams trying to move past pilot projects into scaled AI adoption.

Now What: Compare your AI development tooling rollout against Uber’s approach—particularly how they’re measuring productivity gains and managing adoption friction.

Amazon Mandates AI Tools Even When They Slow Workers Down

What: Amazon is pushing employees to use AI assistants across workflows company-wide, even in cases where the tools are reportedly reducing productivity rather than improving it.

So What: This signals a growing tension between AI adoption mandates and actual ROI—a cautionary tale for enterprise leaders feeling pressure to deploy AI everywhere, regardless of fit.

Now What: Audit your own AI rollouts for “mandate creep” and build feedback loops that let teams flag when tools hurt more than help.

The Agent Workflow Revolution

Autonomous coding agents are reshaping how product teams work and forcing a competitive reshuffling among AI providers.

LangChain Founder Explores How Coding Agents Transform Product Teams

What: Harrison Chase shared insights on how coding agents are reshaping workflows across engineering, product, and design functions.

So What: As coding agents mature beyond developer tools, enterprise leaders need to consider second-order effects on team structures, hiring, and cross-functional collaboration.

Now What: Assess whether your current org design accounts for AI-augmented roles beyond just engineering.

OpenAI Scrambles to Match Anthropic’s Coding Agent Lead

What: Wired reports that OpenAI is racing to catch up to Claude Code, Anthropic’s autonomous coding agent that has gained significant traction among developers.

So What: The competitive dynamics have flipped—OpenAI is now playing catch-up in the agentic coding space, which signals that enterprise teams shouldn’t assume market leaders will dominate every AI category.

Now What: If you’re evaluating coding agents, benchmark actual performance on your codebase rather than defaulting to vendor relationships—this space is moving too fast for brand loyalty.

The Privacy Backlash

As AI embeds deeper into daily life, the counter-reaction is creating its own market.

Counter-Surveillance Goes Consumer: Deveillance’s $1,199 Audio Jammer Goes Viral

What: Deveillance’s Spectre I—a portable device claiming to use AI to prevent nearby microphones from recording conversations—hit 4.3 million views and 42K bookmarks, despite security researchers questioning whether the tech delivers on its promises.

So What: The demand signal matters more than the product: consumer anxiety about always-on AI listening is translating into real willingness to pay for privacy tools. The counter-surveillance market is forming faster than the products to serve it.

Now What: For enterprise teams deploying AI in offices, meeting rooms, and customer spaces, the backlash against ambient recording is real. Factor privacy perception into your AI rollout strategy, not just compliance.

AI Investment at Any Cost

Enterprise leaders are treating AI transformation as a strategic imperative worth painful trade-offs, even cutting profitable operations to fund the shift.

Atlassian Cuts 10% of Staff to Fund AI Pivot

What: Atlassian is laying off roughly 10% of its workforce, redirecting the savings to accelerate its AI product investments.

So What: This signals that even profitable enterprise software companies are treating AI not as an add-on budget item but as a strategic priority worth painful trade-offs—expect more “self-funded AI transformations” across the industry.

Now What: If you’re building an AI business case, note that leadership teams are increasingly willing to make structural cuts to fund AI bets—frame your proposals accordingly.

Weekly Headlines: Issue #12

Blank Metal — Fri, 06 Mar 2026 14:04:03 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

Anthropic Refuses Pentagon Demands, Gets Blacklisted as “Supply Chain Risk”

What: Anthropic refused the Pentagon’s demand to remove all safeguards on military use of its Claude models — specifically protections against domestic mass surveillance and fully autonomous weapons. In response, President Trump directed all federal agencies to stop using Anthropic’s technology, and Defense Secretary Pete Hegseth designated the company a “supply chain risk” — a classification typically reserved for foreign adversaries like Huawei. The designation bars every defense contractor from doing business with Anthropic.

So What: This is unprecedented. An American AI company is being treated like a hostile foreign entity because it insisted on safety red lines. Anthropic’s CEO called the designation “legally unsound” and pledged to challenge it in court. The signal to every enterprise leader: the U.S. government is now willing to use economic coercion against American companies that set limits on how their technology is deployed. The Lawfare Institute’s legal analysis suggests the designation likely won’t survive judicial review, but the chilling effect on other AI companies is the point.

Now What: If your organization uses Anthropic products, don’t panic — this designation targets defense contractors, not commercial enterprises. But watch the legal challenge closely. The outcome will define the boundaries of AI safety commitments for the entire industry. Anthropic’s willingness to absorb this level of government pressure is either principled courage or an existential gamble. The market will decide.

OpenAI Cuts Pentagon Deal — Then Scrambles to Rewrite It

What: Hours after Anthropic was blacklisted, OpenAI announced it had reached a deal allowing the Pentagon to use its technology in classified environments. The deal included stated protections against mass surveillance and fully autonomous weapons. Then the backlash hit — hard. Internal employees were “fuming,” and CEO Sam Altman publicly admitted the announcement “looked opportunistic and sloppy” and that he “shouldn’t have rushed.” Within days, OpenAI and the Pentagon agreed to rewrite the contract language, adding explicit prohibitions against “deliberate tracking, surveillance, or monitoring of U.S. persons.”

So What: MIT Technology Review put it bluntly: “OpenAI’s compromise with the Pentagon is what Anthropic feared.” The speed of the backlash — and Altman’s rare public admission of error — reveals how politically charged military AI has become. The amended contract language is stronger, but the episode exposed a fundamental tension: OpenAI is simultaneously raising $110B from investors who want government contracts and employing workers who signed an open letter demanding guardrails. That tension isn’t going away.

Now What: Enterprise buyers should be watching the actual contract language, not the press releases. When two leading AI companies offer the same technology to the same customer with different safety terms, the terms matter. Ask your AI vendors: what are your red lines? The answer reveals their risk tolerance — and by extension, yours.

“We Will Not Be Divided”: 900 AI Workers Demand Military AI Red Lines

What: Nearly 900 employees at Google and OpenAI signed an open letter titled “We Will Not Be Divided,” urging their companies to join Anthropic in refusing the Pentagon’s demands. About 100 signers were from OpenAI, roughly 800 from Google, and half chose to attach their names publicly. The letter warns: “They’re trying to divide each company with fear that the other will give in.” By Monday, the letter’s momentum had accelerated after U.S. strikes on Iran raised the stakes of military AI use.

So What: This is the largest coordinated action by AI workers since Google’s Project Maven protests in 2018 — but the context is different. In 2018, employees objected to their employer’s contract. In 2026, employees are organizing across competing companies to defend a rival’s position. That’s a remarkable shift. It signals that a significant cohort of AI researchers and engineers view military AI guardrails as a shared professional standard, not a competitive differentiator.

Now What: If you’re hiring AI talent, understand that military AI policy is now a retention factor. Top engineers are choosing employers based on ethical commitments, not just compensation. The letter’s cross-company solidarity suggests that talent will flow toward companies with clear guardrails — and away from those without them.

OpenAI Raises $110B at $730B Valuation — The Largest Private Funding Round in History

What: OpenAI closed $110 billion in new funding — $50B from Amazon, $30B from Nvidia, $30B from SoftBank — at a $730 billion pre-money valuation. The round jumped from a $500B valuation just four months earlier. As part of the deal, AWS becomes the exclusive third-party cloud distributor for OpenAI Frontier, and the companies are scaling their compute agreement to 2 gigawatts of Trainium chips.

So What: The numbers are staggering, but the structure is the story. Amazon isn’t just investing — it’s locking OpenAI into AWS infrastructure. Nvidia isn’t just investing — it’s guaranteeing demand for its hardware. SoftBank isn’t just investing — it’s building on its Stargate joint venture. Each investor is buying strategic positioning, not just equity. The valuation implies investors believe OpenAI will generate revenue comparable to the world’s largest software companies within 3-5 years. That’s either conviction or collective delusion, and there’s no middle ground at $730B.

Now What: For enterprise AI strategy, the Amazon-AWS exclusive distribution deal matters more than the dollar amount. If your organization runs on AWS, OpenAI models through Bedrock just became a first-class integration path. If you’re multi-cloud, this exclusivity may push you toward specific infrastructure choices you didn’t plan to make.

“The Week the AI Jobs Wipeout Got Real”

What: Three major publications converged on the same story simultaneously. The Wall Street Journal declared it “the week the dreaded AI jobs wipeout got real” after Block CEO Jack Dorsey laid off 4,000 people. Bloomberg reported that AI coding agents are “fueling a productivity panic” — engineers are working longer hours, not fewer, as the race to ship AI-augmented output intensifies. The New York Times documented India’s back-office industry beginning to contract as AI automation reaches outsourced knowledge work. Meanwhile, Harry Stebbings reported that three founders with 500-1,000 employees are all planning minimum 20% headcount cuts.

So What: The narrative shifted this week from “AI might displace workers someday” to “it’s happening now, at scale, at named companies.” But the Bloomberg data complicates the simple “AI replaces humans” story — the engineers still employed are working more, not less. AI isn’t eliminating work; it’s compressing the timeline for what’s expected and raising the bar for output per person. The Dallas Fed’s research confirms the paradox: AI is simultaneously aiding and replacing workers, with the balance depending entirely on the role.

Now What: If your organization hasn’t modeled what 20-30% more output per knowledge worker looks like — in terms of capacity planning, team structure, and career paths — you’re behind. The question isn’t whether headcount will change. It’s whether your organization will proactively redesign work around AI capabilities or reactively cut heads when competitors do.

Amazon and OpenAI Unveil Stateful Runtime Environment for AI Agents

What: Buried in the $50B Amazon-OpenAI partnership announcement is a product that could reshape enterprise AI architecture: the Stateful Runtime Environment, launching on Amazon Bedrock. Instead of stitching together disconnected stateless API calls, agents get persistent working context — memory that carries forward, tool and workflow state, environment access, and identity boundaries. Think of it as the difference between an intern who forgets everything between conversations and a colleague who remembers the project.

So What: This directly addresses the biggest engineering bottleneck in production AI agents: state management. Today, every enterprise building agentic workflows has to build its own orchestration layer — storing state, managing tool invocations, handling errors, maintaining permissions. OpenAI and Amazon are saying: stop building that plumbing, use ours. If it works as described, this could collapse months of custom agent infrastructure into a managed service. The InfoWorld analysis frames it as a “control plane power shift” — whoever owns agent state owns the agent ecosystem.

Now What: If your team is building agentic workflows on AWS, request early access to the Stateful Runtime Environment immediately. If you’ve already built custom agent orchestration, evaluate whether this managed service could replace it. The risk of building on proprietary infrastructure is lock-in; the risk of not building on it is rebuilding what Amazon gives away for free.

Scott Belsky: “The Orchestration Layer Is the New Interface Layer”

What: Former Adobe CPO Scott Belsky declared that the critical layer in enterprise AI has shifted: “The orchestration layer is the new interface layer. As we spend our day coordinating agent workflows — in a model-agnostic fashion, local and cloud — and validating outputs, the ultimate layer to own is where coordination takes place.” This represents an evolution from his earlier thesis that Interface > Data > Models, now placing orchestration at the top of the stack.

So What: Belsky is naming what enterprise architects are discovering in practice: the competitive advantage in AI isn’t which model you use — it’s how you coordinate multiple agents, validate their outputs, and manage the human-in-the-loop decision points. This maps directly to what Box CEO Aaron Levie said separately — that agents need their own computer and filesystem, making the orchestration of those environments the key architectural challenge. When two of the most influential product thinkers in tech converge on “orchestration is the new interface,” it’s worth paying attention.

Now What: Evaluate your AI architecture through this lens: who owns the orchestration layer? If the answer is “nobody yet” or “we’re building it ad hoc,” that’s your highest-leverage investment. The companies that build robust orchestration — agent coordination, output validation, approval workflows, state management — will compound their AI capabilities faster than those still debating which model to use.

Simon Willison: The Practitioner’s Guide to Agentic Engineering

What: Simon Willison — creator of Datasette, Django co-creator, and one of the most respected voices in practical AI engineering — published “Agentic Engineering Patterns,” a growing guide to getting the best results from coding agents. The standout chapter, “Hoard Things You Know How to Do,” argues that the most valuable asset in an agent-driven workflow isn’t the model — it’s your accumulated collection of working examples, proof-of-concepts, and documented solutions. Coding agents make these hoarded assets dramatically more valuable because they can be recombined and adapted at machine speed.

So What: This is the practitioner’s answer to all the theoretical “agents will replace developers” discourse. Willison’s patterns — red/green TDD with agents, specific prompt structures, building personal knowledge repositories — are battle-tested techniques from someone shipping real software with AI daily. The core insight is counterintuitive: the more capable AI coding agents become, the more valuable human experience becomes, because experience is what tells you which problems are solvable and which approaches will work.

Now What: If your engineering team is adopting AI coding tools, Willison’s guide should be required reading. Start with the “hoard” principle: document your solutions, build proof-of-concepts, keep working examples of everything. These become compound assets — every problem you’ve solved once becomes a template for AI to solve similar problems faster.

Harry Stebbings: VC and PE Firms Must Deploy Their Own Autonomous Agents

What: Harry Stebbings argued that the deciding factor for investment firms in 2026 isn’t which AI tools they use — it’s whether they’ve deployed autonomous agents that actually do work. The shift from “AI as copilot” to “AI as team member” is the transition that unlocks real operational leverage. Separately, Hiten Shah reinforced the pattern: “This is one manifestation of what SaaS morphs into soon — deploy an agent per client.”

So What: This directly validates what some PE firms are already discovering — that the firms deploying agents for deal research, portfolio monitoring, and operational analysis are pulling ahead of those still using AI as a search engine. The “agent per client” framing from Shah is particularly provocative: it suggests the SaaS business model itself evolves from “software you access” to “agents that work for you.” Investment firms that treat AI adoption as a tool-selection exercise are missing the architectural shift underneath.

Now What: If you’re in PE or VC, ask: do you have agents that run autonomously — doing research, monitoring portfolios, generating reports — or do you have people prompting chatbots? The gap between those two is the gap between incremental efficiency and structural competitive advantage. Start with one high-value workflow (deal screening, competitor monitoring, portco reporting) and build an agent that runs it end-to-end.

Anthropic’s AI Fluency Index: It’s Not How Much You Use AI — It’s How Well

What: Anthropic published the AI Fluency Index, tracking 11 observable behaviors across nearly 10,000 Claude conversations to measure how effectively people collaborate with AI. The key finding: 85.7% of conversations showed iteration and refinement — users building on previous exchanges rather than accepting the first response. Users who iterate exhibit 2.67 additional fluency behaviors on average, roughly double the rate of those who don’t.

So What: This reframes the enterprise AI adoption conversation from “how many people are using it” to “how well are they using it.” Most organizations measure AI adoption by login counts and message volume. Anthropic is arguing those are vanity metrics. The behaviors that predict better outcomes — iterating, clarifying goals, questioning the model’s reasoning, identifying missing context — are teachable skills, not innate abilities. That makes AI fluency a training problem, not a technology problem.

Now What: Stop measuring AI adoption by usage volume. Start measuring by behavior quality. The 11 fluency behaviors Anthropic identified are a ready-made rubric for enterprise training programs. If your team accepts Claude’s first response without iteration, you’re leaving most of the value on the table.

Weekly Headlines: Issue #11

Blank Metal — Fri, 27 Feb 2026 14:02:38 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

Anthropic Enterprise Event Rattles — Then Rallies — Software Stocks

What: Anthropic hosted an enterprise agents event in New York that initially spooked software investors, then calmed them. The company showcased Claude Cowork integrations across finance, legal, HR, and engineering — but emphasized that Claude needs data from existing software vendors to be useful. Software stocks that had been hammered 25-30% in 2026 rallied on the news.

So What: Wall Street analysts from Deutsche Bank, Jefferies, and William Blair reached the same conclusion: Anthropic is positioning itself as an “intelligence infrastructure” layer on top of existing enterprise software, not a replacement for it. The “SaaSpocalypse” narrative may be overdone — model providers need the data and workflows that incumbents control.

Now What: If your team has been waiting out the AI-disruption panic before making software purchasing decisions, this is a signal to reengage. The winning enterprise stack will likely be incumbents plus AI orchestration, not one replacing the other.

OpenAI Partners with BCG, McKinsey, Accenture, and Capgemini to Deploy Enterprise Agents

What: OpenAI announced “Frontier Alliances” — multi-year partnerships with BCG, McKinsey, Accenture, and Capgemini to help enterprises deploy AI agents at scale through its Frontier platform. Each firm is building dedicated practice groups certified on OpenAI technology with access to product and research teams.

So What: OpenAI is publicly acknowledging that model intelligence isn’t the bottleneck — implementation is. By enlisting the Big Four consulting firms, they’re conceding that enterprise AI adoption requires strategy, change management, workflow redesign, and systems integration that a model provider alone can’t deliver.

Now What: Enterprise leaders should watch which consulting partners develop genuine AI deployment capability versus those just rebranding existing practices. The firms that invest in certified technical teams will separate from those selling AI strategy decks.

OpenAI Ships a Product with Zero Manually-Written Code

What: OpenAI published “Harness Engineering” — a detailed account of building and shipping an internal product with zero lines of human-written code. Using Codex agents, a team of three engineers produced roughly a million lines of code across 1,500 merged PRs in five months, averaging 3.5 PRs per engineer per day.

So What: This isn’t a demo — it’s a production product with daily internal users. The most revealing insight: their bottleneck shifted from writing code to building “scaffolding” — the docs, linters, architectural constraints, and feedback loops that let agents do reliable work. The engineer’s job became designing environments, not writing implementations.

Now What: Start treating your AGENTS.md, CI configuration, and architectural documentation as first-class engineering artifacts. In an agent-heavy workflow, the quality of your scaffolding determines the quality of your output.

Claude Code Security Finds 500+ Bugs That Humans Missed

What: Anthropic launched Claude Code Security, an AI vulnerability scanner that reasons about codebases like a human security researcher rather than pattern-matching against known CVEs. Using Opus 4.6, it found over 500 bugs in production open-source code that had survived expert review. It’s in limited preview for Enterprise/Team customers; open-source maintainers get free access.

So What: This is now a two-horse race with OpenAI’s Aardvark security agent (launched four months earlier). As AI-generated code proliferates, AI-powered security review is shifting from “nice to have” to “essential counterbalance.” The human-in-the-loop design — nothing gets patched without developer approval — is the right trust model for enterprise adoption.

Now What: If your team ships AI-generated code, you need AI-powered security review in the pipeline. Evaluate both Claude Code Security and Aardvark against your actual codebase — the tool that catches bugs your team missed is the one worth adopting.

Every Publishes Editorial Guidelines — Written for AI Agents

What: Media company Every published editorial guidelines explicitly stating they write for both human readers and AI agents. Technical guides are “specifically optimized to serve as instructions for agents.” They also use a tool called Proof to track text provenance — which text is human-written versus AI-generated.

So What: This is the first major media company to publicly declare “agent-readable” as a design goal alongside “human-readable.” Just as “mobile-friendly” became a content standard a decade ago, “agent-friendly” content may be next. The provenance tracking via Proof signals that transparency about AI authorship is becoming table stakes.

Now What: Audit your own content — documentation, knowledge bases, SOPs — through an agent-readability lens. If AI agents will consume your content to take action on behalf of your customers or employees, structure and clarity matter more than ever.

Notion Ships Custom Agents That Run Autonomously Across Tools

What: Notion launched Custom Agents — autonomous AI teammates that operate continuously across Notion, Slack, email, calendar, Figma, and Linear. Setup is describe-and-trigger: the agent writes its own instructions and wires up its own tools. Early adopters include Ramp (300+ agents) and Remote (saved 20 hours/week replacing their IT help desk).

So What: The “agents as teammates” framing is becoming the default product paradigm for productivity software. Notion’s approach — agents that monitor channels, capture requests, enrich data, and route information without human prompting — shows how AI features are evolving from “ask a question” to “run a workflow.”

Now What: If your team uses Notion, start with one high-volume, low-risk workflow (FAQ routing, sprint reporting, request triage) and build a Custom Agent. The learning curve is in identifying which workflows benefit from always-on monitoring versus on-demand AI assistance.

Pete Koomen: Most AI Apps Are “Horseless Carriages”

What: YC Partner Pete Koomen argues that most AI applications are failing because they mimic old software design patterns instead of rethinking around AI capabilities. His central example: Gmail’s AI draft feature produces generic, formal emails that take longer to prompt than to write manually — while a properly designed system prompt would let users teach the AI their voice once and reuse it forever.

So What: The core insight is about who should write the system prompt. In traditional software, developers define behavior and users provide input. But when an AI agent acts on your behalf, you should be teaching it how to behave — not accepting a one-size-fits-all version designed by committee. “Most AI apps should be agent builders, not agents.”

Now What: If you’re building or buying AI tools, ask this question: does the product let users customize the system prompt, or does it force a generic experience? The tools that let users teach the AI their specific context will win.

Devin Ships Its Biggest Update Since Launch

What: Cognition released the largest update to Devin — the AI software engineering agent — since its initial launch. The update expands Devin’s ability to handle multi-file changes, longer-running tasks, and more complex codebases autonomously.

So What: The AI coding agent space is now a genuine multi-player competition: Codex, Claude Code, Devin, and Cursor are all shipping major capability updates within weeks of each other. Karpathy’s observation about the pace of change (see below) isn’t hyperbole — the tooling landscape is shifting faster than most engineering teams can evaluate.

Now What: If you evaluated Devin six months ago and passed, it’s time to re-benchmark. The competitive pressure between these tools is driving capability improvements at a pace where quarterly reevaluation is more appropriate than annual.

Aaron Levie: Jevons Paradox Means More Demand for Engineering, Not Less

What: Box CEO Aaron Levie argues that lowering the cost of engineering through AI won’t reduce demand — it will increase it. Citing Jevons Paradox (when a resource becomes cheaper, total consumption increases), he makes the case that cheaper software creation means more software gets built, not fewer engineers get hired.

So What: This directly challenges the “AI will replace developers” narrative. If Levie is right, enterprises should be planning for a world where AI dramatically increases the surface area of what gets built — requiring more engineering judgment, architecture, and oversight, even as the per-unit cost of code drops. The services firms that help enterprises navigate this expansion will be busier, not obsolete.

Now What: Reframe your AI investment thesis: instead of “how many developers can we cut,” ask “what could we build if development cost 10x less?” The organizations that treat AI coding tools as expansion enablers rather than headcount reducers will capture disproportionate value.

Karpathy: Programming Changed More in Two Months Than in Ten Years

What: Andrej Karpathy — former Tesla AI chief, OpenAI founding member — states that programming has changed more in the last two months than in the previous decade, driven by the rapid advancement of AI coding tools.

So What: When someone with Karpathy’s credibility and vantage point makes this claim, it’s worth taking seriously. The pace of change in developer tooling — Codex, Claude Code, Devin, Cursor — is compressing what used to be years of incremental improvement into weeks. For non-technical leaders, this means the assumptions behind your 2026 engineering plans may already be outdated.

Now What: If your engineering team hasn’t fundamentally revisited their tooling and workflow in the last 90 days, they’re falling behind. The gap between teams leveraging AI coding tools and those that aren’t is widening fast.

The Real Shift Behind Enterprise Agents

Blank Metal — Tue, 24 Feb 2026 14:48:05 GMT

Today, February 24, 2026, Anthropic announces the launch of even more enterprise agentic capabilities with Claude that will enable highly useful task performance across all knowledge work, especially in Sales, Legal, Finance, and Operations. Blank Metal is pleased to announce our participation in this launch as an implementation partner.

What We Thought Agents Would Be

For two years, the market has talked about AI agents like they were digital employees. Point-and-click builders. Autonomous bots you’d deploy to handle discrete workflows—one for contracts, one for CRM maintenance, one for financial reporting.

The mental model was delegation: identify a task, hand it to a bot, receive the output. Same shape as human delegation, just faster and cheaper.

What we got is so much better.

What Anthropic Built Instead

Anthropic’s enterprise agentic capabilities launch builds on Claude Cowork—but the architecture isn’t “build and deploy discrete bots.” Instead, it deconstructs and reimagines work itself.

The building blocks:

Skills: Discrete capabilities you teach Claude—checking SOWs for margin, applying brand style, researching prospects.

Connectors: Links to your systems via MCP, giving Claude direct access to your enterprise context.

Commands: Workflow shortcuts that bundle common operations.

Subagents: For complex work, Claude delegates to other Claudes with specialized configurations.

These bundle into Plugins—shareable packages that turn Claude into a domain specialist. Here’s what makes this architecture powerful: Anthropic is shipping foundational plugins for domains like Sales, Legal, Finance, and Operations. These aren’t closed systems—they’re starting points. Each plugin establishes a baseline capability that elevates everyone’s Claude immediately. Your Sales team gets sophisticated pipeline analysis and deal coaching out of the gate. Your Finance team gets month-end reconciliation logic and variance analysis built in.

But then each person can extend those foundations with their own skills. Your Head of Sales can add your specific qualification criteria and competitive positioning. Your Controller can layer in your cost allocation rules and reporting templates. The published plugin gives everyone a sophisticated baseline. Your custom additions make it yours.

This is fundamentally different from building agents from scratch or buying point solutions. You’re not starting from zero, and you’re not locked into someone else’s complete vision. You’re building on a foundation that keeps getting better as Anthropic and the community contribute new capabilities.

The key insight: you’re not sharing agents that orchestrate workflows. You’re sharing the underlying skills and recipes that any agent can use.

One universal agent. Continuously uploadable capabilities. Not a workforce of specialized agents, but a single collaborator that keeps getting better at more things.

This requires a completely different relationship with the machine.

The Capabilities Organizations Need to Develop Making this leap means building new organizational muscles:

Capability decomposition. The old paradigm asks “which agent handles this task?” The new paradigm asks “what skills does this task require?” That means breaking work into teachable components—not “handle expense reports” but “verify receipt amounts against submitted totals >> apply company travel policy logic >> flag outliers for review >> format approvals for the finance system.” Many people have never articulated their work at this level.

Taxonomy building. You’re not just teaching Claude one skill. You’re building a structured map of what your organization actually does—the underlying capabilities that combine into the workflows you run every day. This becomes an organizational asset that compounds over time.

Real-time evaluation. With traditional agents, engineering teams could evaluate a shared agent’s outputs against expected outcomes. In the Claude Cowork model, each user runs their own configuration of capabilities. There’s no single “agent” to QA centrally. You become responsible for evaluating your own outputs—or for building skills that do evaluation for you.

Non-linear delegation. When you delegate to a person, you’re renting a bundle of pre-existing capabilities. They know how to write emails, navigate systems, and apply judgment. With Enterprise Agents, you start with shared organizational context and connectors, but then you’re building capability bundles yourself—skill by skill, connector by connector. It’s not “hire someone who knows the job.” It’s “teach a universal intelligence your specific version of the job.”

The good news: Claude Cowork with access to your enterprise context can get you pretty far out of the gate. And these skills compound. Organizations that invest in shared context and capability decomposition now will have skill libraries that grow more valuable every month.

It’s not automation. It’s capability architecture.

Why Blank Metal Is an Implementation Partner

At Blank Metal, we’ve been living this transformation. Over the past year, Claude Code and similar AI coding assistants fundamentally changed how our engineering team works. We stopped thinking “write this code” and started thinking “solve this problem.” We built capability taxonomies, learned to decompose our workflows into teachable skills, and came out the other side thinking differently about what software development even is.

We recognize this pattern. Enterprise Agents is the same shift—now available to everyone, not just developers.

Here’s what we know from experience: technical work isn’t the bottleneck. The hard work is helping teams shift from “automate this task” to “what capabilities does this task require?”

The engagement pattern we’re seeing:

First, connector infrastructure—ensuring MCPs exist for internal systems so agents can access critical company knowledge.

Second, capability mapping—the difficult work of decomposing organizational processes into teachable skills.

Third, adoption enablement—helping people internalize a new mental model for human-AI collaboration.

This isn’t about building something and handing it off. It’s about changing how your organization thinks about work itself.

The Moment of Choice

Anthropics expanded enterprise agentic capabilities for Claude are live. The organizations that recognize this as a paradigm shift will build capability libraries that compound over time. The ones looking for discrete task automation will find plenty of options. They just won’t be the ones defining what enterprise AI looks like in two years.

The architecture is here. The question is whether your organization is ready to think about work differently.

Ready to start mapping your capabilities? Let’s talk.

Weekly Headlines: Issue #10

Blank Metal — Fri, 20 Feb 2026 14:03:18 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

NVIDIA Open-Sources Two-Way Voice Model for Real-Time Conversation

What: NVIDIA released an open-source voice model capable of simultaneous listening and speaking—mimicking natural human conversation dynamics rather than turn-based exchanges.

So What: This removes a major friction point in voice AI applications; enterprises building customer service agents, copilots, or voice interfaces now have a free, production-ready foundation for more natural interactions.

Now What: If you’re evaluating voice AI vendors, benchmark this against paid alternatives—open-source parity is accelerating faster than most procurement cycles assume.

Vertical SaaS Founder Says LLMs Will Gut His Own Industry

What: A founder who built traditional vertical SaaS argues that LLMs are collapsing core software moats—proprietary UI, workflow complexity, data aggregation—into simple chat interfaces, reducing years of engineering to “one week of writing.”

So What: If this 12-24 month disruption timeline holds, enterprise leaders buying or building vertical software need to reassess whether they’re investing in durable value or soon-to-be-commoditized features.

Now What: Audit your current vertical software stack through this lens—which vendors are truly differentiated by domain expertise versus UI complexity that AI could flatten?

OpenAI Open-Sources GABRIEL for Automated Qualitative Research

What: OpenAI released an open-source Python toolkit that uses GPT to convert qualitative data like interviews, social media posts, and images into quantitative measurements at scale—replacing manual coding work.

So What: Enterprises sitting on mountains of unstructured customer feedback, support transcripts, or internal surveys now have a legitimate pathway to extract structured insights without building custom pipelines or hiring research teams.

Now What: If your org has qualitative data gathering dust, pilot GABRIEL on a contained dataset to see if it can surface insights your current analytics miss.

OpenAI Bets Codex’s Future on GUI, Not Terminal

What: In a new interview, OpenAI’s Codex team revealed 5x growth since January to over a million weekly users, shipped GPT-5.3 Codex alongside their fastest coding model “Spark,” and explained why they’re prioritizing graphical interfaces over terminal-based workflows.

So What: The explicit contrast with Claude Code’s terminal-first approach signals a strategic fork in how major AI labs think enterprise developers want to interact with coding agents—and their emphasis on code review (not generation) as the next bottleneck suggests where tooling investments may shift.

Now What: If you’re evaluating coding agents, test both paradigms with your actual workflows—the GUI vs. terminal split may matter more for adoption than underlying model capability.

OpenAI Acquires OpenClaw Creator to Boost Agent Push

What: Peter Steinberger, creator of OpenClaw, is joining OpenAI to work on agentic AI development.

So What: OpenAI is aggressively recruiting founders with deep experience building developer tools and document processing—capabilities that matter for enterprise agents that need to read, manipulate, and act on business documents.

Now What: Watch for OpenAI’s agent capabilities to improve around document handling, a common pain point in enterprise automation workflows.

Sinofsky: AI-Native Companies Will Define the Next Era

What: Former Microsoft exec Steven Sinofsky argues that companies building their core products with AI—not just adding AI features—will become the platform leaders of this generation, comparable to how Microsoft owned Windows, Google owned web, and Facebook/Uber owned mobile.

So What: This framing challenges enterprises to honestly assess whether they’re treating AI as a feature bolt-on or a foundational capability—a distinction that may determine who leads and who follows in the next decade.

Now What: Audit where AI sits in your org: is it enhancing existing workflows, or fundamentally reshaping how your core product gets built and delivered?

Perplexity’s Model Council Pits Three AI Giants Against Each Other

What: Perplexity now runs queries across Claude, GPT, and Gemini simultaneously, then uses a fourth model to synthesize where they agree, disagree, and what each uniquely contributes.

So What: The feature itself is basic, but it validates a strategic bet: as model performance varies by task, the real value shifts to the orchestration layer—knowing which model to use when and how to reconcile conflicting outputs.

Now What: If you’re building AI applications, start thinking about multi-model routing and synthesis as a core capability, not an edge case.

Former GitHub CEO Raises $60M to Reimagine Developer Tools for AI Agents

What: Nat Friedman’s new startup Entire has raised $60M to build a developer platform designed from the ground up for AI agents, not human coders.

So What: This is a serious signal that foundational dev infrastructure may need rebuilding—GitHub, built for human collaboration, may not be optimized for how AI agents read, write, and manage code at scale.

Now What: Engineering leaders should start asking whether their current toolchains will bottleneck agent-assisted development as adoption accelerates.

Box CEO Calls for New Agent Identity Standards

What: Aaron Levie argues that AI agents need their own distinct identities within enterprise platforms, requiring a fundamental rethink of authentication and authorization frameworks.

So What: As agents increasingly act on behalf of employees—accessing systems, making decisions, moving data—current identity models built for humans won’t cut it, creating both security gaps and audit nightmares.

Now What: Start mapping which systems your AI tools access today and whether your IAM framework can distinguish between human and agent actions.

Figma and Anthropic Bridge AI Code to Visual Design

What: Figma’s new Code to Canvas feature lets designers import Claude Code output directly into Figma as editable design components.

So What: This closes a critical gap in AI-assisted product development—code generated by AI can now flow back into design tools, potentially accelerating the prototype-to-production loop for teams using both platforms.

Now What: If your product team spans design and engineering, explore whether this integration could reduce handoff friction in your current workflow.

Weekly Headlines: Issue #9

Blank Metal — Fri, 13 Feb 2026 15:02:54 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

Agent Infrastructure & Governance

The bottleneck isn’t building agents — it’s running them reliably, safely, and at scale.

Former GitHub CEO Raises $60M to Manage AI Agent Fleets

What: Thomas Dohmke launched Entire, a dev platform designed to track and govern code produced by AI agents, starting with an open-source CLI that captures the full reasoning context behind AI-generated commits.

So What: This validates what many teams are discovering firsthand—the real bottleneck isn’t generating code with AI, it’s reviewing and governing what actually ships. Existing Git workflows weren’t built for machine-speed output.

Now What: If your engineering org is scaling AI coding tools, start auditing where human review is already becoming the constraint—that’s likely where you’ll need new tooling or processes first.

Warp Bets Agent Orchestration Is the Real Enterprise Bottleneck

What: Warp launched Oz, cloud infrastructure for scheduling, governing, and running coding agents at scale—complete with cron triggers, sandboxed environments, and audit trails. The platform already writes 60% of Warp’s own PRs.

So What: The hard part isn’t getting agents to work. It’s getting them to work reliably, safely, and repeatedly without human babysitting. Warp is betting that orchestration—not the agents themselves—is where the real enterprise value sits.

Now What: If you’re running agents in production (or planning to), audit your current orchestration stack. The gap between “demo-ready” and “enterprise-ready” is exactly where tools like this aim to live.

Claude Cowork Comes to Windows—Leveling the AI Desktop Playing Field

What: Anthropic shipped Claude Cowork for Windows, bringing the same AI desktop assistant that’s been a big unlock for Mac users to the PC ecosystem.

So What: Mac users are used to having first access to tools, while PC users have been largely limited to Microsoft-supported options. This matters in enterprise: most corporate desktops are Windows. Getting AI that feels like a real collaborator—not just a chat window—onto PCs opens the door for millions of knowledge workers who’ve been watching from the sideline.

Now What: If your org has been waiting for AI desktop tools that aren’t locked into the Microsoft ecosystem, this is worth a pilot. The “pick a folder” simplicity may move faster than a Copilot rollout stuck in security review.

The SaaS Reckoning

SaaS isn’t dead — but the business model that sustained it is under structural pressure.

The Big 4 Consulting Unbundling Has Started

What: Bitwise CEO Hunter Horsley draws a parallel between the Craigslist unbundling of 2006 and what’s happening to professional services firms like PwC—every service line on their website is work that agentic systems can now do faster and cheaper.

So What: The difference from 2006: enterprises don’t have to wait for a startup to build the disruption and hope M&A works out. They can build the agentic version themselves, now. The path is clearer—hire a team, build the capability, own the asset.

Now What: Most enterprises know they need to move. They’re just stuck on where to start. Identify one consulting-heavy workflow and scope what the agentic version looks like.

Ben Thompson: The SaaS Wall Is Structural, Not Cyclical

What: Ben Thompson argues the SaaS downturn isn’t a dip—it’s a permanent shift from growth companies to stable businesses. Seat-based pricing breaks when headcount stagnates or shrinks. Systems of record remain defensible, but discretionary tools face disruption from AI-native alternatives that do the same job without the per-seat tax.

So What: This is the distinction enterprise buyers need to internalize: your CRM and ERP aren’t going anywhere, but the layer of tools around them—the ones your teams adopted during the growth era—are vulnerable. When agents can perform tasks across systems, the “good enough” SaaS tool that lives on inertia loses its moat overnight.

Now What: Audit your software stack in two buckets: systems of record (defensible, keep) and discretionary tools (exposed, renegotiate or replace). Your leverage as a buyer has never been higher.

Listen here

a16z’s Anish Acharya: The “SaaS Apocalypse” Is a Myth—But the Moats Are Changing

What: a16z general partner Anish Acharya calls the “SaaS is dead” narrative overblown, but argues the real shift is significant: AI agents are breaking the lock-in legacy software relied on. Meanwhile, consumers are happily paying $200+/month for tools like Claude and Grok—not because they’re for everyone, but because they’re 100x better for someone. He also frames the dev tools market (Cursor vs. Claude Code) as looking more like Cloud than Uber vs. Lyft.

So What: Two things to watch: (1) SaaS as a delivery model survives, but SaaS as a moat erodes when agents can move data between systems and perform tasks across tools. Switching costs are dropping. (2) The willingness to pay $200+/month for AI tools that actually work signals that the market is bifurcating—power users will pay dramatically more for dramatically better tools, while commodity features race to zero.

Now What: If you’re evaluating enterprise software, the new buying criteria isn’t “what does this tool do?” It’s “how well does this tool work with agents?” And if you’re selling software, watch your per-seat pricing—the market is moving toward value-based models fast.

Listen here

Models & Code Abundance

Model capabilities are commoditizing fast — the strategic question is shifting from “which model?” to “what do you build on top?”

Six Major AI Releases in a Single Day — The Pace Is the Headline

What: February 12 saw six major AI releases hit simultaneously: OpenAI shipped GPT-5.3-Codex-Spark on Cerebras hardware (1,000+ tokens/sec for real-time coding), Google launched Gemini 3 Deep Think (new #1 on math/science benchmarks), MiniMax dropped M2.5 at 96% cheaper than competitors, ByteDance’s Seedance 2.0 video model went viral in China, Zhipu hiked prices 30%, and Amazon engineers revolted internally—choosing Claude Code over Amazon’s own Kiro.

So What: No single release here is the story. The story is that six shipped on the same Tuesday and nobody blinked. Model capabilities are commoditizing so fast that “best model” rotates weekly. The strategic question is shifting from “which model is best?” to “which infrastructure lets you swap models without rebuilding?”

Now What: If your AI strategy is built around a single model provider, the lock-in risk isn’t going away—it’s inverting. The moat is in your orchestration layer and data, not the model underneath.

Scott Belsky: Exponential Code Won’t Kill SaaS—It’ll Reshape Who Wins

What: Adobe CPO Scott Belsky argues that AI-generated code abundance won’t destroy enterprise software—it will make foundational infrastructure (security, data graphs, shared memory) more valuable, while “private-equity-owned niche clunkware” gets disrupted.

So What: Three big implications: (1) “Disposable software”—temporary, single-use apps—will proliferate, creating new security surface area. (2) Per-seat pricing is dead; usage-based and outcome-based models are coming. (3) The apprenticeship pipeline breaks when AI automates entry-level tasks, and companies need to deliberately rebuild knowledge transfer.

Now What: The apprenticeship point is the sleeper insight. If AI handles the grunt work that used to train junior people, who’s building the next generation of senior talent? Every enterprise needs an answer to this.

The Narrative vs. The Reality

The hype says everything is about to change. The data says the people who already changed are breaking.

Matt Shumer’s “Something Big Is Happening” Goes Mainstream

What: AI startup founder Matt Shumer’s open letter comparing AI’s current moment to February 2020 Covid went viral outside the tech bubble—mainstream media picked it up and non-technical audiences are now reading it.

So What: The capability claims are real. But the fear framing and the Covid analogy are doing all the heavy lifting. Covid happened to people—a pathogen hitting zero immunity. AI is happening for people to build with. Better analogy: the internet in 1998. Clearly going to change everything. Unclear exactly how. The people who leaned in early did fine.

Now What: When clients forward this (and they will), don’t amplify the fear or dismiss it. Translate it: which of your workflows has AI already outpaced current tools, and which are 18 months out? That’s the useful conversation.

The First Signs of AI Burnout Are Hitting the Early Adopters

What: A Berkeley Haas study of 200 employees over 9 months found that AI doesn’t reduce work—it intensifies it. Workers managed more parallel threads, checked AI outputs constantly, and revived long-deferred tasks, creating cognitive overload disguised as productivity.

So What: The study’s warning: organizations can’t distinguish genuine productivity gains from unsustainable intensity. People are losing sleep because “just one more prompt” is irresistible. Work bleeds into lunches and late evenings not because of deadlines, but because AI makes it feel like you could do more.

Now What: This is the contrarian signal in a week full of AI optimism. If your teams are adopting AI aggressively, check in on sustainability—not just output. The most engaged users may be the ones burning out fastest.

Stop Demoing AI. Start Building With It.

Blank Metal — Wed, 11 Feb 2026 21:27:38 GMT

We’re running Claude Code Labs with Anthropic — and capacity is limited.

Here’s what happens at so many enterprise “AI workshops”. . . someone presents slides about what AI could do. Maybe there’s a live demo where an engineer types a prompt and everyone nods. People leave feeling inspired and do absolutely nothing different on the next day.

We’ve watched this pattern enough times to know it doesn’t work. Inspiration without execution is just. . . time spent.

What Claude Code Labs actually are

Claude Code Labs are in-person workshops where enterprise teams spend three hours building with Claude Code — Anthropic’s command-line tool for agentic coding. Not just watching.

Blank Metal runs these in partnership with Anthropic’s Applied AI team. The format is simple: you bring your laptop, we bring the curriculum. By the end of the session, every person in the room has created real code with Claude Code — on their own machine, against real problems, with support from people who do this for a living.

Who these are for

These workshops are for technical teams and practitioners at enterprise organizations who want to move past the “should we use AI?” conversation and into the “how do we actually adopt this?” phase.

The people who get the most out of it tend to have an input into their company’s AI tooling adoption. They’re architects and engineers who are tired of evaluating tools through blog posts and vendor demos and want to feel what it’s actually like to work with Claude Code against real problems. We also encourage technical Product Managers to attend - it could change their whole approach to work.

To attend, you need basic command line familiarity, a laptop you can install the software on (that can access the internet), and a Claude Code console account.

Why we do this

We’ve been helping enterprise organizations build with AI — not as a concept, but as deployed, running software. The most common pattern we see is misalignment: leadership has bought into AI, but engineers still don’t know what to do with it day-to-day. Or they haven’t had time to really dig in and focus on learning it.

That gap doesn’t close with a webinar. It closes when people sit down and build something.

Claude Code Labs compresses that learning curve into a structured afternoon: 90-minute setup, three hours of building, and an optional hour to keep experimenting.

30 to 50 people per session. Small enough that nobody hides in the back. Large enough to make it worth your organization’s time to host.

Limited capacity

We’re running a limited number of these in partnership with Anthropic. Each one requires coordination between our team, Anthropic’s Applied AI group, and the host organization. That means limited slots, and demand fills the calendar fast.

If your organization is serious about adopting AI coding tools at scale — not as a pilot, not as a “let’s see” experiment, but as part of how your engineering team actually works — these labs are designed for you.

Get on the list!

We’re booking Claude Code Labs now for enterprise organizations. If you want to host one for your team, reach out to us here. We’ll tell you if it’s a fit and when the next available session is.

AI Real Talk Event

Blank Metal — Fri, 06 Feb 2026 19:24:02 GMT

AI Real Talk is a quarterly gathering for executives who are doing the hard work of bringing AI into their organizations — not just talking about it. They know that transforming their organization starts with honest conversation.

The format is simple: real challenges, real outcomes, and real conversation about what you haven’t figured out yet. Everything runs under Chatham House rules so people can speak freely about what they’re working on.

On April 9th, join Blank Metal and Anand Francis (Head of AI, US Bank) for a live fireside chat on how he’s driving real AI adoption inside a highly regulated enterprise—what’s working, what isn’t, and what he’s learning 90 days in.

This event is invite-only.
Sign up here to RSVP to the April 9th event

Weekly Headlines: Issue #8

Blank Metal — Fri, 06 Feb 2026 15:02:04 GMT

Anthropic Launches Claude Opus 4.6 with Finance-First Features

What: Anthropic released Claude Opus 4.6, which now tops the Finance Agent benchmark at 60.7%—a 5.5% jump from Opus 4.5—and outperforms GPT-5.2 on knowledge work tasks in finance and legal.

So What: This isn’t just another model bump. Opus 4.6 can combine regulatory filings, market reports, and internal data to produce analyses that would otherwise take analysts days. First-pass deliverables are now genuinely usable, not just rough drafts.

Now What: If your finance or legal teams are still treating AI as a research assistant, it’s time to test it as a first-draft analyst. The “vibe working” era means reviewing AI output, not creating from scratch.

Alibaba Open-Sources Speech Models That Beat GPT-4o

What: Alibaba released Qwen3-ASR, a pair of open-source speech recognition models supporting 52 languages that match or outperform GPT-4o Transcribe and Whisper-large-v3, with the smaller version achieving 92ms latency.

So What: Enterprise teams building voice interfaces, transcription pipelines, or multilingual support tools now have a high-performance open-source option that sidesteps API costs and vendor lock-in.

Now What: If you’re paying per-minute for transcription APIs or building latency-sensitive voice features, benchmark Qwen3-ASR against your current stack—the cost and control benefits could be substantial.

OpenAI Codex Mac App Now Free to Try

What: OpenAI released a native Mac desktop app for Codex, its AI coding assistant, with free trial access for ChatGPT Plus subscribers.

So What: This signals OpenAI’s push to embed AI coding tools directly into developer workflows—enterprise teams evaluating coding assistants now have another serious contender alongside GitHub Copilot and Claude.

Now What: If your engineering team is already paying for ChatGPT Plus, have a few developers test Codex against your current tooling to see if consolidation makes sense.

Codex vs. Opus Showdown Reveals the “Ur-Coding Model” Race

What: Every’s head-to-head comparison of GPT-5.3 Codex and Opus 4.6 found both models converging toward similar capabilities, with Opus excelling on complex, open-ended tasks while Codex delivers more consistent, reliable execution.

So What: The finding that matters isn’t which model won—it’s the thesis that great coding agents become great general work agents, meaning AI coding infrastructure may be foundational business infrastructure, not just a dev tools expense.

Now What: If you’re running multiple AI models in production, consider formalizing a model selection framework that matches task complexity to model strengths rather than defaulting to one provider.

Apple Brings Agentic Coding to Xcode 26.3

What: Apple’s latest Xcode update introduces agentic AI capabilities that can autonomously write, debug, and refactor code within its native development environment.

So What: This signals Apple’s serious entry into AI-assisted development tooling—enterprise teams building iOS/macOS apps now have a first-party option competing with Copilot and Cursor, potentially tightening Apple’s ecosystem lock-in further.

Now What: If your org ships Apple platform apps, evaluate whether this native integration outweighs your current third-party coding assistant—ecosystem alignment often wins on friction alone.

OpenAI Retires GPT-4o as It Doubles Down on GPT-5.2

What: Starting February 13th, ChatGPT users will lose access to GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini—though API access remains unchanged for developers.

So What: With only 0.1% of users still choosing GPT-4o daily, this signals OpenAI’s aggressive push to consolidate around newer models, reducing maintenance overhead while accelerating GPT-5.2 development.

Now What: Audit any internal tools or workflows that reference specific model versions in ChatGPT (not API)—and use this as a reminder that model availability is never guaranteed.

GitHub Brings Claude and Codex AI Agents to Its Platform

What: GitHub is integrating Anthropic’s Claude and OpenAI’s Codex as AI coding agents directly into its platform, expanding beyond its existing Copilot offering.

So What: This signals GitHub’s shift from single-vendor AI to a multi-model marketplace approach—enterprise teams may soon choose which AI agent handles their coding workflows rather than being locked into one provider.

Now What: Evaluate whether your current Copilot agreements allow flexibility to test competing agents as they become available.

A16Z Maps AI’s Winners: Leaders, Gainers, and Surprise Breakouts

What: Andreessen Horowitz published an analysis categorizing AI companies into “leaders” (dominant incumbents), “gainers” (fast-rising challengers), and “unexpected winners” (companies benefiting from AI tailwinds without being AI-native).

So What: The framework offers enterprise leaders a useful mental model for evaluating vendors and partnerships—distinguishing between established players with staying power, aggressive upstarts worth watching, and traditional companies quietly leveraging AI to pull ahead of competitors.

Now What: Use this lens when assessing your own vendor stack: are you over-indexed on “leaders” who may move slowly, or missing “gainers” who could deliver faster innovation?

Williams F1 Team Partners with Anthropic and Atlassian on AI

What: Williams Racing announced a multi-year partnership with Anthropic’s Claude and Atlassian to integrate AI across team operations, from race strategy to engineering workflows.

So What: F1 teams are data-intensive operations with split-second decision requirements—this signals enterprise AI moving into high-stakes, real-time environments where the margin for error is measured in milliseconds.

Now What: Watch how AI performs in domains where speed and precision are non-negotiable; successful use cases here could inform time-critical enterprise applications in your own operations.

China’s Kimi K2 Claims Top Open-Source LLM Crown

What: Moonshot AI released Kimi K2, a trillion-parameter open-source model that benchmarks above Claude Opus 4.5 on coding and agentic tasks, available free via API and Hugging Face.

So What: The open-source frontier is now a multi-geography race—enterprises gain another high-capability option outside US providers, but must weigh geopolitical considerations alongside performance.

Now What: If you’re building agentic workflows, benchmark Kimi K2 against your current stack—the cost-performance math on open models keeps getting more competitive.

Weekly Headlines: Issue #7

Blank Metal — Fri, 30 Jan 2026 15:03:22 GMT

Welcome to Blank Metal’s Weekly AI Headlines.

Short, sharp, and focused on impact.

Amazon’s One Medical Launches AI Health Assistant for Members

What: One Medical introduced an AI-powered health assistant that helps members get personalized answers, book appointments, and prepare for visits—all integrated with their medical records.

So What: Amazon is quietly building the AI-native healthcare stack, and this signals that consumer-facing AI health tools backed by real clinical data (not just chatbots) are becoming table stakes for healthcare operators.

Now What: If you’re in healthcare or benefits, watch how members respond to AI triage—this could reshape expectations for how employees interact with any health-adjacent enterprise service.

OpenAI and Leidos Partner to Deploy AI Across Federal Government

What: OpenAI announced a partnership with defense contractor Leidos to bring ChatGPT and agentic AI capabilities to federal government agencies, marking OpenAI’s most significant push into the public sector.

So What: This signals AI moving from pilot projects to production infrastructure in government—and Leidos’ involvement means this is about deployment at scale, not innovation theater. Enterprise vendors should expect federal AI procurement to accelerate.

Now What: If you serve federal customers, understand that AI capabilities are moving from “nice to have” to table stakes faster than procurement cycles typically allow.

Vercel Launches Marketplace for Shareable AI Agent Skills

What: Vercel released skills.sh, a marketplace for portable “skill” files that can be easily installed across multiple AI coding tools, including skills that teach one AI model how to orchestrate another.

So What: This signals a shift toward modular, composable AI tooling where enterprises can mix capabilities across models—potentially letting teams route tasks to the best-fit model rather than being locked into a single provider.

Now What: Explore whether standardized skill files could simplify how you manage AI agent capabilities across your stack, especially if you’re already juggling multiple coding assistants.

OpenAI Pulls Back the Curtain on Codex Agent Architecture

What: OpenAI published a detailed technical breakdown of how its Codex coding agent works internally, explaining the loop structure that powers its autonomous code generation.

So What: This transparency helps enterprise teams understand what’s actually happening under the hood of AI coding tools—useful for setting realistic expectations and identifying where human oversight should plug in.

Now What: Use this as a reference point when evaluating any agent-based coding tool; understanding the loop architecture helps you spot limitations before they become production problems.

Claude Gets Interactive Tools for Live Data and Code

What: Anthropic launched interactive tools that let Claude connect to Google apps, run code, create visualizations, and work with files directly within conversations.

So What: This moves Claude from chatbot to workspace—enterprise teams can now build live dashboards, analyze real-time data, and automate multi-step workflows without leaving the interface.

Now What: Audit your current workflow gaps where context-switching slows teams down; these native integrations may eliminate the need for custom middleware.

Software Engineer Argues SRE Is the Future of the Field

What: Swizec Teller makes the case that as AI handles more code generation, the real value in software engineering shifts to running and maintaining systems reliably—the domain of Site Reliability Engineering.

So What: For enterprise leaders, this suggests your AI coding investments may accelerate a talent shift: engineers who can keep complex systems running become more valuable than those who only write new code.

Now What: Audit whether your team’s skills—and hiring criteria—are weighted toward building versus operating, and adjust accordingly.

Alibaba’s Qwen-3 Becomes First AI Model to Run in Orbit

What: China’s Adaspace launched Alibaba’s Qwen-3 model on a satellite, completing a full inference cycle in under two minutes as part of a planned 2,800-satellite AI compute network.

So What: This is less about space and more about China’s long-term bet on distributed AI infrastructure—a signal that major players are thinking beyond earthbound data centers for compute capacity and resilience.

Now What: File this under “strategic awareness” rather than action items—it’s a useful reference point when evaluating where global AI infrastructure investment is heading.

MCP Gets a UI Layer: Tools Can Now Return Interactive Interfaces

What: Anthropic and partners launched MCP Apps, an extension to the Model Context Protocol that lets tools return interactive UI components—dashboards, forms, visualizations—that render directly in conversations rather than plain text.

So What: This solves a real gap in agentic workflows: instead of re-prompting for every data exploration step, users can interact with rich interfaces while keeping the AI model in the loop. The “build once, deploy across Claude, ChatGPT, VS Code” promise signals MCP maturing into genuine infrastructure.

Now What: If you’re building MCP tools, evaluate whether adding UI components could dramatically improve the user experience—especially for data-heavy or configuration-intensive workflows.

ChatGPT Can Now Analyze Your Apple Watch Health Data

What: OpenAI enabled ChatGPT to import and analyze Apple Watch health data, letting users ask questions about their sleep patterns, heart rate trends, and activity metrics in natural language.

So What: This is the first major consumer AI integration with personal health data at scale—a proving ground for how AI assistants will handle sensitive, longitudinal personal information and a preview of the “AI as personal health analyst” future.

Now What: Watch how users respond to AI having access to intimate health data. The trust patterns established here will shape enterprise health AI expectations.

OpenAI Launches Free AI Research Tool, Signals Vertical Playbook

What: OpenAI released Prism, a free AI-powered workspace for scientists built on an acquired LaTeX platform, explicitly modeling the approach Cursor and Windsurf took with code editors.

So What: The pattern matters more than the product—OpenAI is telegraphing that “acquire specialized workflow tool + add deep AI context” is the winning formula, which means every vertical-specific SaaS tool is now either a platform for this play or a target.

Now What: Audit your team’s specialized workflow tools (design, legal, finance) and ask which ones have full context of the work being done—those are where AI integration will hit hardest.