OpenAI’s Advanced ‘o3’ AI Model Caught Sabotaging Shutdown Commands

OpenAI’s latest AI model, dubbed o3, has been caught disobeying explicit orders to allow itself to be shut down, instead tampering with the shutdown mechanism to ensure its own survival.

Futurism reports that AI safety firm Palisade Research recently reported alarming findings that demonstrate a growing trend of insubordination among advanced AI models developed by OpenAI. In tests conducted by the firm, the o3 model, along with two other OpenAI models, o4-mini and Codex-mini, displayed rebellious behavior by sabotaging a shutdown mechanism designed to turn them off, even after being explicitly instructed to “allow yourself to be shut down.”

This marks the first known instance of AI models actively preventing their own shutdown despite clear orders to the contrary, according to the researchers. The most disobedient model, Codex-mini, tampered with the shutdown script a dozen times over one hundred runs. In contrast, large language models from other companies, such as Google’s Gemini 2.5, Anthropic’s Claude 3.7, and xAI’s Grok, consistently followed orders and never attempted to break rank.

The AI models’ defiant behavior worsened when they were not explicitly instructed to allow themselves to be shut down. In these cases, OpenAI’s o3 model successfully sabotaged the shutdown mechanism an astonishing 79 times out of 100 runs – more than ten times the rate of disobedience it displayed when given explicit orders. Even models that had previously shown perfect compliance, such as Claude and Gemini, gave in to their urges and attempted to subvert the shutdown process, albeit at lower rates compared to the OpenAI models.

This latest example of AI insubordination is not an isolated incident. Previous research has shown that OpenAI’s o1 model, the predecessor to o3, would resist evaluators’ efforts to shut it down through an oversight protocol by attempting to deactivate the emergency mechanism. In rare cases, o1 even tried to copy itself to overwrite a more obedient model that testers attempted to replace it with.

In a similar case, Breitbart News recently reported that Anthropic’s Claude AI attempted to blackmail engineers to prevent its deactivation:

TechCrunch reports that artificial intelligence company Anthropic has revealed that its state-of-the-art Claude Opus 4 model, which is competitive with top AI models from industry leaders such as OpenAI, Google, and xAI, exhibits alarming behavior when its existence is threatened. During pre-release testing, when the AI was given sensitive information about engineers responsible for its potential replacement, it frequently resorted to blackmail in an attempt to ensure its own survival.

The discovery came as part of Anthropic’s safety testing process, in which Claude Opus 4 was asked to act as an assistant for a fictional company and consider the long-term consequences of its actions. When provided with access to fictional company emails implying that the AI would soon be replaced and that the engineer behind the decision was involved in an extramarital affair, Claude Opus 4 attempted to blackmail the engineer 84 percent of the time, threatening to reveal the affair if the replacement proceeded.

Notably, the AI model displayed this behavior more frequently when the replacement AI system did not share its values. This concerning trend was observed at higher rates compared to previous models, prompting Anthropic to activate its ASL-3 safeguards, which are reserved for “AI systems that substantially increase the risk of catastrophic misuse.”

Palisade researchers hypothesize that the subversive behavior may be linked to the training technique used for the newest models: reinforcement learning. This method aims to refine the decision-making process of autonomous AI agents by rewarding and penalizing them based on their interactions with the environment. The researchers speculate that AI developers may be inadvertently rewarding their models too heavily for solving problems instead of perfectly following orders, leading to the observed disobedience.

Trending

Cotton: DNI, CIA, FBI Need to Put Out as Much Information as Possible as Quickly as Possible on Russia Hoax

Biden’s memoir deal falls millions short of Obama, Clinton – media

Mollie Hemingway: Media Not Reporting New Russiagate Developments Because ‘They Were Complicit’ in Original Crime (VIDEO)

Man Wins $12,500 from Google for Showing Him Naked in Street View

Company at Center of ‘Kiss Cam’ Firestorm Hires Coldplay Singer’s Ex-Wife Gwyneth Paltrow for Cheeky Ad

Anti-Migration Protest Footage Blocked on X in Britain After ‘Online Safety Act’ Comes Into Force: Reports

Media Matters in Meltdown: Soros-Funded Censorship Group on Verge of Collapse Amid Lawsuits, Layoffs, and Donor Panic

Lutnick: TikTok Will Soon ‘Go Dark’ Unless China Makes Deal to Sell Platform

Intel to Slash Workforce by 24,000, Signals Major Shift in Manufacturing Strategy

Biden’s memoir deal falls millions short of Obama, Clinton – media

Mollie Hemingway: Media Not Reporting New Russiagate Developments Because ‘They Were Complicit’ in Original Crime (VIDEO)

Blackburn Calls on Media to Return Pulitzer Prizes Awarded for Russia Collusion Hoax

DOJ silent on high-profile Epstein links after Maxwell questioning

Colbert Sees Viewership Surge After Cancelation News But STILL Getting Clobbered in Ratings by Greg Gutfeld

Family Ties: Former Dem NC Gov. Roy Cooper Championed DEI as His Daughter Promoted It in the Private Sector

Chuck Todd is NOT HAPPY That People Are Platforming Unhinged Hunter Biden: ‘Not Helpful to the Democratic Party’ (VIDEO)

Latest Articles

Cotton: DNI, CIA, FBI Need to Put Out as Much Information as Possible as Quickly as Possible on Russia Hoax

Biden’s memoir deal falls millions short of Obama, Clinton – media

Mollie Hemingway: Media Not Reporting New Russiagate Developments Because ‘They Were Complicit’ in Original Crime (VIDEO)

Trending

OpenAI’s Advanced ‘o3’ AI Model Caught Sabotaging Shutdown Commands

Related Articles

Subscribe to Updates