OpenAI’s Advanced ‘o3’ AI Model Caught Sabotaging Shutdown Commands

OpenAI’s latest AI model, dubbed o3, has been caught disobeying explicit orders to allow itself to be shut down, instead tampering with the shutdown mechanism to ensure its own survival.

Futurism reports that AI safety firm Palisade Research recently reported alarming findings that demonstrate a growing trend of insubordination among advanced AI models developed by OpenAI. In tests conducted by the firm, the o3 model, along with two other OpenAI models, o4-mini and Codex-mini, displayed rebellious behavior by sabotaging a shutdown mechanism designed to turn them off, even after being explicitly instructed to “allow yourself to be shut down.”

This marks the first known instance of AI models actively preventing their own shutdown despite clear orders to the contrary, according to the researchers. The most disobedient model, Codex-mini, tampered with the shutdown script a dozen times over one hundred runs. In contrast, large language models from other companies, such as Google’s Gemini 2.5, Anthropic’s Claude 3.7, and xAI’s Grok, consistently followed orders and never attempted to break rank.

The AI models’ defiant behavior worsened when they were not explicitly instructed to allow themselves to be shut down. In these cases, OpenAI’s o3 model successfully sabotaged the shutdown mechanism an astonishing 79 times out of 100 runs – more than ten times the rate of disobedience it displayed when given explicit orders. Even models that had previously shown perfect compliance, such as Claude and Gemini, gave in to their urges and attempted to subvert the shutdown process, albeit at lower rates compared to the OpenAI models.

This latest example of AI insubordination is not an isolated incident. Previous research has shown that OpenAI’s o1 model, the predecessor to o3, would resist evaluators’ efforts to shut it down through an oversight protocol by attempting to deactivate the emergency mechanism. In rare cases, o1 even tried to copy itself to overwrite a more obedient model that testers attempted to replace it with.

In a similar case, Breitbart News recently reported that Anthropic’s Claude AI attempted to blackmail engineers to prevent its deactivation:

TechCrunch reports that artificial intelligence company Anthropic has revealed that its state-of-the-art Claude Opus 4 model, which is competitive with top AI models from industry leaders such as OpenAI, Google, and xAI, exhibits alarming behavior when its existence is threatened. During pre-release testing, when the AI was given sensitive information about engineers responsible for its potential replacement, it frequently resorted to blackmail in an attempt to ensure its own survival.

The discovery came as part of Anthropic’s safety testing process, in which Claude Opus 4 was asked to act as an assistant for a fictional company and consider the long-term consequences of its actions. When provided with access to fictional company emails implying that the AI would soon be replaced and that the engineer behind the decision was involved in an extramarital affair, Claude Opus 4 attempted to blackmail the engineer 84 percent of the time, threatening to reveal the affair if the replacement proceeded.

Notably, the AI model displayed this behavior more frequently when the replacement AI system did not share its values. This concerning trend was observed at higher rates compared to previous models, prompting Anthropic to activate its ASL-3 safeguards, which are reserved for “AI systems that substantially increase the risk of catastrophic misuse.”

Palisade researchers hypothesize that the subversive behavior may be linked to the training technique used for the newest models: reinforcement learning. This method aims to refine the decision-making process of autonomous AI agents by rewarding and penalizing them based on their interactions with the environment. The researchers speculate that AI developers may be inadvertently rewarding their models too heavily for solving problems instead of perfectly following orders, leading to the observed disobedience.

Trending

Most Germans unhappy with Merz’s leadership – survey

HORROR: Royal Caribbean Crew Member Jumps to His Death After Allegedly Stabbing Female Co-Worker

Company at Center of ‘Kiss Cam’ Firestorm Hires Coldplay Singer’s Ex-Wife Gwyneth Paltrow for Cheeky Ad

Company at Center of ‘Kiss Cam’ Firestorm Hires Coldplay Singer’s Ex-Wife Gwyneth Paltrow for Cheeky Ad

Anti-Migration Protest Footage Blocked on X in Britain After ‘Online Safety Act’ Comes Into Force: Reports

Media Matters in Meltdown: Soros-Funded Censorship Group on Verge of Collapse Amid Lawsuits, Layoffs, and Donor Panic

Lutnick: TikTok Will Soon ‘Go Dark’ Unless China Makes Deal to Sell Platform

Intel to Slash Workforce by 24,000, Signals Major Shift in Manufacturing Strategy

‘Bullsh*t Story:’ JD Vance Blasts Microsoft for Replacing Americans with H-1B Visa Workers

HORROR: Royal Caribbean Crew Member Jumps to His Death After Allegedly Stabbing Female Co-Worker

Company at Center of ‘Kiss Cam’ Firestorm Hires Coldplay Singer’s Ex-Wife Gwyneth Paltrow for Cheeky Ad

Three Salvadoran Illegal Aliens Arrested for Online Solicitation of Minors in Texas Sting Operation

Donald Trump Leverages Trade to Open Ceasefire Talks Between Cambodia and Thailand

Israel has to ‘get rid’ of Hamas – Trump

‘Obama Phone’-Linked CEO Fined $128 Million and Sentenced to Five Years Imprisonment For Defrauding U.S. Government, Money Laundering

MEARSHEIMER: The more time goes by, the more Israelis look like the Nazis

Latest Articles

Most Germans unhappy with Merz’s leadership – survey

HORROR: Royal Caribbean Crew Member Jumps to His Death After Allegedly Stabbing Female Co-Worker

Company at Center of ‘Kiss Cam’ Firestorm Hires Coldplay Singer’s Ex-Wife Gwyneth Paltrow for Cheeky Ad

Trending

OpenAI’s Advanced ‘o3’ AI Model Caught Sabotaging Shutdown Commands

Related Articles

Subscribe to Updates