The Power Behind the Recent Anthropic and OpenAI Healthcare Moves
A personal take from a Blank Metal co-founder
When I told colleagues I was leaving my leadership role at NMDP (one of the country’s leading transplant and cellular therapy organizations) to help build an AI engineering startup, the questions were predictable: “Are you sure? Startups are risky. AI is overhyped. Why leave now?”
I left because I could see what was coming. In the past month, both Anthropic and OpenAI proved the point. Within days of each other, they moved decisively into healthcare. Anthropic launched a comprehensive platform with HIPAA-ready infrastructure, pre-built clinical connectors, and agent skills for core workflows like prior authorization.
When the two leading AI companies move simultaneously on healthcare, that’s not coincidence. It’s confirmation that the technology has crossed a fundamental threshold.
Everyone’s celebrating the tooling breakthrough, and rightfully so. But tools were never the only hard part. What matters now is what healthcare organizations do with them, and that requires a new production discipline most organizations don’t have yet.
What I Was Seeing
In late 2024, I watched our product team at NMDP experiment with tools like Claude for market research, competitive analysis, and roadmap planning. Work that typically took 8-12 weeks was getting done in a week or two. Same quality, just faster.
What really mattered wasn’t the speed. It was what product managers could do with those extra weeks: drive stakeholder alignment, define the details that actually matter, focus on work only humans can do.
Around the same time, I was working with my now co-founders to build AI platform tools for nonprofits in our free time. We were helping mission-driven organizations automate impact reporting so their teams could focus on mission work instead of wrestling with spreadsheets. I remember nonprofit leaders lighting up in meetings—not because the technology was impressive, but because they could finally imagine their teams spending time differently.
I realized this wasn’t about adding AI features to existing products. The technology was going to fundamentally change how work gets done. And healthcare, with all its regulation, complexity, and life-or-death stakes, was going to need people who knew how to actually ship solutions in production, not just run pilots.
Why This Time Is Actually Different
I’ve watched multiple waves of “AI is going to transform healthcare” hype come and go.
AI in healthcare has always been real. At NMDP, I spent five years working with machine learning systems that analyzed 30 years of transplant outcomes to help physicians make better treatment decisions. Those systems worked. They saved lives.
So when I say generative AI is fundamentally different, I’m not dismissing what came before. I’m recognizing a phase change in what’s both possible and risky.
With traditional machine learning, you test before launch and know what the model will do. With generative AI, you can’t predict exactly what will happen once it’s out in the wild with real people in real situations. And because it’s comfortable inferring information, you need to be extra conscious about the quality of data it’s using.
That changes everything about what it means to ship responsibly.
The Missing Piece: Generative AI Production Discipline
A few months ago, a healthcare company walked us through a prototype: a virtual health coach chatbot that patients could interact with instead of calling a human coach. They were excited about it and wanted help getting it ready to scale.
We asked to see the underlying architecture. Within minutes, we spotted fundamental problems. No validated data sources for the LLM to draw on. No guardrails to ensure factual accuracy. Most critically, no evaluation or monitoring systems for production.
This isn’t theoretical. We’ve seen what happens when organizations get this wrong.
A patient-facing healthcare chatbot recently took in symptoms and gave dangerously inappropriate advice, recommending treatments that would have been right for a completely different patient with a different gender and condition. The provider only found out because the patient called their nurse line to report it.
In this case, the systems put in place to monitor interactions in production failed. No systematic evaluation of question-and-answer pairs. No ongoing product review. The organization had no way of knowing how many other patients received similarly problematic advice.
This gap has kept many healthcare organizations from moving beyond pilots. You can build a compelling demo in weeks. But production-grade systems require:
Validated data sources that ground AI responses in authoritative clinical information
Real-time evaluation that catches problematic outputs before they reach patients
Continuous monitoring that surfaces patterns and edge cases as they emerge
Audit trails that meet healthcare’s regulatory and legal requirements
Until this week, healthcare organizations largely had to build all of this from scratch.
What Has Changed
Both Anthropic and OpenAI announced platforms that address these production readiness gaps comprehensively. Not as afterthoughts, but as core infrastructure.
The foundation is now built in. Anthropic’s connectors to PubMed’s 35+ million peer-reviewed citations, CMS coverage databases, and authoritative clinical coding systems mean AI responses can be grounded in verified sources of truth. What used to require 6+ months of custom integration work can now happen in weeks. Validation still depends on governance, provenance, and clinical review, but the systems now have easier access to high-quality, high-value sources.
The accuracy improvements are measurable and dramatic. Without access to validated data, Claude achieves 75% accuracy on ICD-10 coding tasks—impressive for a general model, but nowhere near production-grade. With the ICD-10 connector, that jumps to 99.8%. That’s the difference between a demo and a system you can actually deploy.
But here’s what really matters for production: When Claude processes a prior authorization request, it shows exactly which connectors it used, what it validated, and how it reasoned through each step. You can see when it confirmed a diagnosis code against ICD-10, when it cross-referenced coverage requirements in the CMS database, when it flagged a discrepancy that needs human review.
That transparency provides the foundation you need to build production-grade monitoring and evaluation systems. You’re not trying to peer into a black box. You have clear audit trails showing what the system did and why.
This is where the platform architecture pays off. With validated data connections and HIPAA compliance frameworks built in, you can focus your engineering resources on what actually matters: building the evaluation and monitoring systems specific to your workflows. You’re not spending six months on basic integration. You’re not wrestling with compliance from scratch. You’re building on a foundation that handles the universal hard problems so you can solve the domain-specific ones.
With generative AI in healthcare, the real quality work begins after launch. You need to catch the edge cases that testing didn’t anticipate. You need to understand how the system behaves across thousands of real patient interactions. You need to know, quickly, when something’s going wrong.
That virtual health coach prototype I mentioned? With this platform architecture, the team could now build automated tests against validated data sources that would catch dangerous mismatches in development. They could implement monitoring that flags statistical anomalies in real-time—not because they built an entire data infrastructure, but because they built evaluation logic on top of an infrastructure that already exists.
We want to have the nightmares before we ship, so we can build in the right controls ahead of time. But when we can’t anticipate every nightmare, we need systems that wake us up the moment one starts. The new platform foundations make both possible.
Why I Made This Bet
I joined the other co-founders of Blank Metal because they have a track record of building products people love, including deep work in healthcare’s most complicated areas. They’re product people who understand what it takes to ship technology that works in regulated environments.
Being named as one of Anthropic’s initial Healthcare and Life Sciences partners validates that we’re approaching this work the right way. Healthcare organizations don’t just need access to powerful AI. They need partners who can help them build production-grade solutions with appropriate governance, evaluation, and safety controls.
The infrastructure is ready. The compliance frameworks are built in. The question for healthcare organizations isn’t “can AI do this?” anymore. It’s “how do we build the discipline to ship this safely and learn fast enough to stay ahead?”
Five years from now, the organizations that moved decisively in 2026 will be running health plans where prior authorizations take minutes instead of days, health systems where providers spend their time with patients instead of in documentation systems, and life sciences companies that can identify the right patients for therapies in a fraction of the time it takes today.
The ones that waited will still be trying to figure out how to get started.
Healthcare moves in waves. The window for this one just opened much wider.
Teresa Marchek is a Co-Founder and Head of Product at Blank Metal.





