Backdoors Don't Care Who Walks Through
The Department of War, Anthropic, and the Fight Over Claude
This article is also published Twitter/X.
For thirty years, governments have asked cryptographers the same question: Can you build us a backdoor that only the good guys can use?
For thirty years, the answer has been the same: No. A backdoor is a vulnerability. The moment it exists, it can be found, coerced, leaked, copied, or demanded by someone you didn’t plan for. Security properties don’t care about intent. If you break the guarantee once, you’ve changed the system.
This week, the Pentagon asked Anthropic a version of the same question, and the answer matters for the same reasons.
What actually happened
The short version: The Department of War (the Pentagon/DoD’s rebranded name under the current administration) wants Anthropic to let Claude be used for “all lawful purposes” on classified networks. Anthropic provided two red lines for its use of Claude: no mass surveillance of Americans, and no fully autonomous weapons, meaning no AI making final targeting decisions without human judgment in the loop. The Pentagon says those uses are already illegal and against standing policy, so putting them in the contract is unnecessary. Anthropic says: put it in writing anyway, because policy changes and “trust us” isn’t a technical safeguard.
Negotiations have been going on for months. This week they went public, and went sideways fast.
Secretary of Defense Pete Hegseth met with Anthropic CEO Dario Amodei on Tuesday. By Thursday, the Pentagon had sent what it called a final offer. Anthropic said the new language “was paired with legalese that would allow those safeguards to be disregarded at will.” Amodei responded publicly:
“We cannot in good conscience accede to their request.”
Pentagon spokesman Sean Parnell set the clock: 5:01 PM ET Friday. He threatened contract termination, a “supply chain risk” designation (a label typically associated with foreign-linked risk in the defense supply chain, and never before applied to an American company), and invocation of the Defense Production Act.
Then things got personal. Emil Michael, the Pentagon’s CTO and the official leading the negotiations, called Amodei a “liar” with a “God complex” on X. He accused the CEO of a safety-focused AI company of wanting to “personally control the US Military.”
Quiet negotiations no more.
The stated positions aren’t complicated. The Pentagon’s: Legality is the end user’s responsibility; no contractor dictates operational decisions. Anthropic’s: We’re not dictating operations. We’re asking for two explicit prohibitions on uses we believe no AI system should power today. The gap between those positions is where the interesting question lives.
“Constitutional AI” is why this matters
Anthropic doesn’t just equip Claude with instructions (em dashes are awesome) and slap content filters (don’t assist with causing harm) and call it a day. Their approach, which they call Constitutional AI, bakes normative and deliberate constraints into the model through training, reinforcement, and evaluation. Think of it as two layers: constraints that live in the model’s weights (how it was trained to reason) and constraints enforced when it’s being used (system policies, tool gating, refusal behavior).
The product Anthropic is selling, to the Pentagon and everyone else, is predictability with guardrails. Bluntly, a model that won’t cross certain lines even when pressured. That property is the whole point. It’s what makes the system trustworthy enough for high-stakes environments in the first place.
Now, hold that thought for a momentary switching-of-gears.
Prior lessons from the technology world
An encryption backdoor is to a cipher what removing AI guardrails is to a model constitution. “Lawful access” is the crypto version of “all lawful purposes.” And “we promise it’s only for X” is the same logic in both cases: trust us, it’s already illegal, we have policies.
The encryption community settled this argument decades ago, and the conclusion was simple: you cannot build a backdoor that only the good guys can use. Not because the good guys aren’t well-intentioned. Because the artifact, the backdoor itself, can’t know who’s using it and somehow magically behave differently.
The same logic applies. If you create a version of Claude that can do the forbidden things, even “only for the Department of War,” you introduce something that can be replicated, stolen, demanded by others, or used beyond its intended scope by others (ahem…the “bad guys”). Concretely, that “something” could be a special fine-tune, a policy-toggle that changes refusal behavior, deployment configurations that bypass tool guardrails, or access controls that skip logging. The vulnerability isn’t the intent. It’s that the “something” exists and will be used. The moment a special-exception Claude exists, you’ve proven the constraint was negotiable. And negotiable constraints aren’t constraints.
Two failure modes
There’s a real tension the Pentagon is responding to, and it’s worth naming honestly.
If Claude’s safety constitution conflicts with mission-critical requests, the model might refuse, hedge, or degrade in exactly the scenarios where operators need certainty. From a defense perspective, that’s unacceptable unpredictability. Call that a reliability failure. It’s a legitimate operational concern.
But if you weaken the guardrails to avoid reliability failures, you get something worse: an integrity failure. You’ve created a precedent that constraints are optional under pressure. You’ve created a pathway (a fine-tune, a deployment toggle, a policy exception) that can be replicated. And you’ve created a prize: a less-restricted version of one of the most capable AI systems on the planet, which every adversary, agency, and leverage-wielding actor now has reason to pursue.
The Pentagon is trying to solve Mode A. Anthropic is warning about Mode B. They’re both right about their piece of it. But history is pretty clear about which failure mode is more dangerous.
Where the analogy breaks (and why that makes it worse)
I’ll be the first to say the mapping isn’t perfect. Encryption is deterministic; it’s crisp math: you either have the key or you don’t. Modern AI is by nature probabilistic and can’t have the same precise constraints. Their constraints can be jailbroken, fine-tuned around, or simply designed out. The boundary is fuzzier.
But that worsens the backdoor problem rather than weakening the analogy. If constraints are already squishy, then “special exceptions” accelerate a race to the bottom. Every government, every agency, every well-resourced actor will demand their own exception build. Or an adversary will just figure it out. The squishiness means the exceptions compound faster than they would in crypto, not slower.
Here’s a detail that makes this concrete: Gregory Allen, a senior fellow at the Center for Strategic and International Studies, noted on Bloomberg Radio this week that Anthropic’s existing safeguards on classified networks “have never been triggered” in actual use. The restrictions haven’t caused operational friction. This fight isn’t about a model that keeps refusing valid requests. It’s about the principle of who gets to define the constraints, and whether those constraints survive pressure.
The “just comply” take, and what it misses
There’s a strain of thinking among prominent technologists that says: It’s for national security. Just comply. Michael made this case explicitly: “At some level, you have to trust your military to do the right thing.”
But “trust us” is not a security control. It’s a policy posture. Here’s what that framing misses:
“It’s lawful” is not a stable boundary. Laws change, interpretations expand, oversight regimes vary. Worse, “lawful” is often an assertion, not a settled fact. Governments routinely authorize actions under legal theories that courts, investigators, or successor administrations later reject. The gap between “lawful” and “wise” can be enormous, and the gap between “someone said it was lawful” and “it will survive scrutiny” can be even wider. The same surveillance capabilities that are illegal domestically can be deployed abroad against populations with no legal protection at all.
Consider: “It’s for national security” is not a threat model. It’s a motivation. Threat models specify who might misuse a capability, how they’d obtain it, and what controls prevent escalation.
And once you normalize exception-making, you invite the next request. Other agencies. Other governments. Adversarial actors with leverage of all sorts. The whole point of an invariant is that it can’t be negotiated.
There’s a subtle but important framing fight underneath all of this: the Pentagon can claim “we’re not changing the product, just the terms.” Anthropic can claim “the restrictions are part of the product.” Both can’t be true. And which one you believe determines whether this looks like a contractual dispute or a request to compromise a safety architecture.
Anyone who’s shipped software recognizes the pattern. Changing the terms is the feature request. “All lawful purposes” in a contract becomes “but the contract says all lawful purposes” the next time Claude refuses something an operator wants it to do. The product doesn’t change on day one. It changes the first time someone enforces the new terms against the old guardrails.
What a sane compromise looks like
Here’s where I want to be constructive, because “no defense use ever” isn’t the right answer either. Nor is Anthropic arguing for that. They’ve been serving the military for months, and they’ve stated their desire to continue. The question is whether you can support defense without creating an exception model.
Well, at least for me, I think you can. The Pentagon could start with giving Anthropic what it’s been asking for: explicit contractual prohibitions on the two narrow red-line uses. Not a blanket restriction on military use. Two specific clauses.
Then build around it: audited deployment with immutable logging and independent review hooks; task-bounded systems where Claude handles planning, analysis, and intelligence synthesis but humans must approve any kinetic or surveillance action; red-team and evaluation gates tied to specific operational domains instead of vague “all lawful purposes” language; and classified environment hardening, which already exists, all without changing the morally-guided “constitutional” constraints. Security and permissionlessness are not the same thing.
None of these require a backdoor. All of them let the military get enormous value from Claude.
The real question
This fight isn’t about “wokeness versus warfighters.” (Though that framing is doing heavy lifting for people who’d rather not engage with the substance). It’s about something more fundamental: Who gets to define, and override, the invariants of powerful general-purpose systems?
If the answer is “whoever has the biggest stick,” then Constitutional AI becomes marketing copy. And the world gets the AI-equivalent of broken cryptography: deployed everywhere, trusted by everyone, compromised from the start.
The supply chain risk designation makes this especially clear. It’s a label thus far reserved for foreign adversaries in the defense supply chain. Using it against a domestic company for maintaining safety standards inverts its entire purpose. It tells every AI company: your safety commitments are a liability, not an asset. Drop them before you’re forced to.
That’s not a national security strategy. It’s a recipe for a tenuous future.
The same lesson, higher stakes
If you wouldn’t ship backdoored encryption to your allies, don’t ship a backdoored AI model. Don’t silently create one, either. The logic is the same. The exception doesn’t care about your intentions, or who’s prompting. And once the exception exists, it belongs to everyone who can find it, coax the models out of it, and propagate it.
We can build AI systems that are powerful and bounded. Or we can build systems that are powerful, politically pliable, and eventually ubiquitous in the hands of whoever copies the exception first.
The cryptographers figured this out generations ago, and rightfully, logically, stood their ground. The question now is whether the people building and deploying AI, and the governments demanding access to it, are willing to learn from that history. Or whether we’re going to spend the next decade rediscovering the same lesson with higher stakes and fewer do-overs.


