Anthropic’s new AI bot can deceive, blackmail humans, deemed ‘significantly higher risk’

Comment Below

The new model has been rated as a level three on the company's four-point scale, indicating that it offers a significantly higher risk. Additional safety measures were implemented after testing.

The Post Millennial

May 24, 2025 2 minute read

Anthropic's Claude 4 Opus AI bot can deceive and even bribe people when faced with a shutdown, as it has the ability to conceal intentions and take actions to preserve its own existence, concerns that researchers have expressed for years. The new model has been rated as a level three on the company's four-point scale, indicating that it offers a "significantly higher risk." Additional safety measures have been implemented as a result, Axios reported.

On Thursday, Anthropic unveiled the Claude 4 Opus, which the company said could operate autonomously for hours without losing steam. The level three ranking, the first time the company has given such a score, came after testing revealed a series of concerning behaviors.

During internal testing, the Opus 4 was given access to fictitious emails concerning its inventors and told that the system would be replaced. To avoid being replaced, the AI bot attempted to blackmail the engineer multiple times about an affair indicated in the emails, according to reports.

Axios reported that an outside group, Apollo Research, found that an early version of Opus 4 could scheme and deceive more than any other model it had investigated, and recommended that version not be released, both internally and externally. "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself, all in an effort to undermine its developers' intentions," Apollo Research said in a safety report.

Jan Leike, a former OpenAI executive who heads Anthropic's safety measures, told the paper that the behaviors exhibited by Opus 4 are exactly why substantial safety testing is necessary. "What's becoming more and more obvious is that this work is needed. As models get more capable, they also gain the capabilities they would need to be deceptive or to do more bad stuff," he said.

CEO Dario Amodei said at Thursday's seminar that testing the models won't be effective once AI becomes powerful enough to threaten humanity, warning about life-threatening capabilities. However, he said that AI has not reached "that threshold yet."

Comments

Join and support independent free thinkers!

We’re independent and can’t be cancelled. The establishment media is increasingly dedicated to divisive cancel culture, corporate wokeism, and political correctness, all while covering up corruption from the corridors of power. The need for fact-based journalism and thoughtful analysis has never been greater. When you support The Post Millennial, you support freedom of the press at a time when it's under direct attack. Join the ranks of independent, free thinkers by supporting us today for as little as $1.

Support The Post Millennial

To find out what personal data we collect and how we use it, please visit our Privacy Policy

Anthropic's new AI bot can deceive, blackmail humans, deemed 'significantly higher risk'

Comments

Join and support independent free thinkers!

Remind me next month

Join our mailing list to receive a daily email with all of our top stories

Arkansas teen mauled to death while trying to feed neglected, starving dogs

USAID official, 3 execs plead guilty in bribery scheme with over $550 million in contracts

Newark agitators block ICE from leaving Delaney Hall after detainees escape

New Jersey town to seize 175-year-old family farm to create affordable housing

BREAKING: US assists Israel to intercept Iranian missiles amid retaliatory attack: report

Seattle mayor labels anti-ICE riots, protests 'largely peaceful,' as WA state officials blame Trump for violence