AI Geopolitics Technology · · 9 min read

Anthropic Abandons Core Safety Pledge as Competition Trumps Principle

The AI company built on safety-first principles scrapped its binding commitment to pause dangerous model development, citing rivals and a hostile regulatory climate.

Anthropic has eliminated the central promise of its Responsible Scaling Policy—the pledge to never train or deploy AI models unless it could guarantee adequate safety measures in advance—marking one of the most significant retreats from voluntary AI governance in the industry’s history.

The company announced the change on February 25, 2026, replacing binding safety commitments with what it calls Anthropic described as “nonbinding but publicly-declared” goals. In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate, a promise that distinguished it from competitors and became the foundation of its brand identity as the AI company with a “soul.”

That commitment is now gone. The previous policy stipulated that Anthropic should pause training more powerful models if their capabilities outstripped the company’s ability to control them and ensure their safety—a measure that’s been removed in the new policy. Instead, according to TIME, the company will now rely on transparency reports and flexible safety roadmaps rather than hard restrictions on development.

The Competitive Pressure Defense

Chief Science Officer Jared Kaplan defended the reversal by citing competitive dynamics. “We felt that it wouldn’t actually help anyone for us to stop training AI models,” he told TIME in an exclusive interview. “We didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”

“If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe.”

— Anthropic Responsible Scaling Policy Version 3.0

The company said in its blog post that its previous safety policy was designed to build industry consensus around mitigating AI risks—guardrails that the industry blew through. The company had hoped that announcing its RSP would encourage other AI companies to introduce similar policies, creating a “race to the top” in which different industry players are incentivized to improve, rather than weaken, their models’ safeguards. That theory of change failed to materialize.

The timing compounds the significance. In February, Anthropic raised $30 billion in new investments, valuing it at some $380 billion, and reported that its annualized revenue was growing at a rate of 10x per year. The company is no longer the cautious underdog—it’s a commercial juggernaut with investor expectations to match.

Anthropic’s Commercial Trajectory
Valuation$380B
Annual Revenue Growth10x
Recent Funding Round$30B
OpenAI Valuation (Comparison)$850B

What Actually Changed

The new policy represents a shift driven by what Anthropic calls a “collective action problem.” The company’s previous RSP committed to implementing mitigations that would reduce its models’ absolute risk levels to acceptable levels, without regard to whether other frontier AI developers would do the same. The revised version includes conditional language: the policy now promises to “delay” AI development if leaders both consider Anthropic to be leader of the AI race and think the risks of catastrophe to be significant.

The policy now separates what Anthropic will do unilaterally from what it recommends for the broader industry. Rather than being hard commitments, the new safety goals are “public goals that we will openly grade our progress towards”. The company will publish “Risk Reports” every three to six months and maintain a “Frontier Safety Roadmap” outlining aspirational safety measures, according to the Anthropic announcement.

Context

Anthropic was founded in 2021 by former OpenAI researchers, including CEO Dario Amodei, who left partly over concerns that OpenAI was prioritizing commercialization over safety. The company positioned itself as a public benefit corporation focused on AI Safety research and received major investments from Amazon ($4 billion) and Google ($2 billion).

The Pentagon Factor

The policy change arrived amid a high-stakes confrontation with the U.S. Department of Defense. Defense Secretary Pete Hegseth gave Anthropic CEO Dario Amodei an ultimatum on Tuesday to roll back the company’s AI safeguards or risk losing a $200 million Pentagon contract. The Pentagon threatened to put Anthropic on what is effectively a government blacklist.

Anthropic maintains the timing is coincidental. According to a source familiar with the matter, the policy change is separate and unrelated to Anthropic’s discussions with the Pentagon. The company states it has been working on the RSP revision for months, with internal discussions dating back nearly a year, according to TIME.

Still, Anthropic acknowledged that the policy environment has shifted toward prioritizing AI competitiveness and economic growth, while safety-oriented discussions have yet to gain meaningful traction at the federal level. The company is reportedly refusing to budge on two red lines in Pentagon negotiations: AI-controlled weapons and mass domestic surveillance of Americans, according to CNN.

Industry Precedent and Talent Dynamics

Anthropic is not the first AI leader to retreat from safety commitments. OpenAI updated its mission statement in 2024 by dropping the word “safely” from its goal of ensuring that artificial general intelligence benefits humanity. OpenAI began as a nonprofit and converted to a more traditional for-profit enterprise last year. The pattern suggests voluntary safety commitments struggle to survive contact with commercial reality.

Ironically, Anthropic had been winning the AI talent war precisely because of its safety reputation. Engineers from OpenAI and DeepMind were increasingly more likely to jump ship to Anthropic than the reverse—engineers at OpenAI were eight times more likely to leave the company for Anthropic, while at DeepMind, that ratio was almost 11:1 in Anthropic’s favor, according to venture capital firm SignalFire’s 2025 State of Talent Report. Anthropic also leads the AI industry in holding on to talent, with an 80% retention rate for employees hired over the last two years.

AI Industry Talent Retention (2023-2025)
Company Retention Rate Outflow to Anthropic vs. Inflow
Anthropic 80%
DeepMind 78% 11:1 outflow
OpenAI 67% 8:1 outflow
Meta 64%

Implications for AI Governance

The broader question is whether voluntary AI safety frameworks can function without regulatory backing. Most risk-management initiatives remain voluntary, but a few jurisdictions are beginning to formalise some practices as legal requirements, according to the 2026 International AI Safety Report. In 2025, 12 companies published or updated Frontier AI Safety Frameworks—documents that describe how they plan to manage risks as they build more capable models.

Chris Painter, director of policy at nonprofit METR, told TIME the change is understandable but ominous. The shift shows Anthropic “believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities”. He added: “This is more evidence that society is not prepared for the potential catastrophic risks posed by AI.”

Key Implications
  • Voluntary safety commitments have proven vulnerable to competitive pressure, even at companies founded on safety principles
  • The “race to the top” theory—that one company’s strong safety standards would raise industry norms—has failed empirically
  • Without regulatory requirements, AI companies face prisoner’s dilemma dynamics where caution becomes competitive disadvantage
  • Policymakers relying on industry self-Regulation may need to reconsider binding frameworks

What to Watch

Anthropic’s first Risk Report under the new policy, expected within three to six months, will test whether transparency can substitute for binding commitments. The company’s standoff with the Pentagon will clarify whether its remaining red lines—on autonomous weapons and domestic surveillance—survive fiscal pressure. And the response from AI safety researchers, some of whom joined Anthropic specifically for its principled stance, will signal whether the company can retain its talent advantage after abandoning the promise that attracted that talent.

The broader pattern is clear: as AI companies mature and capital demands intensify, safety commitments that once seemed foundational are being renegotiated. Whether through regulation or market design, the structural incentives that turned Anthropic’s founding principles into competitive liabilities will need to change—or every AI lab will eventually face the same choice.