img

Anthropic's new AI bot can deceive, blackmail humans, deemed 'significantly higher risk'

The new model has been rated as a level three on the company's four-point scale, indicating that it offers a significantly higher risk. Additional safety measures were implemented after testing.

ADVERTISEMENT

The new model has been rated as a level three on the company's four-point scale, indicating that it offers a significantly higher risk. Additional safety measures were implemented after testing.

ADVERTISEMENT
Anthropic's Claude 4 Opus AI bot can deceive and even bribe people when faced with a shutdown, as it has the ability to conceal intentions and take actions to preserve its own existence, concerns that researchers have expressed for years. The new model has been rated as a level three on the company's four-point scale, indicating that it offers a "significantly higher risk." Additional safety measures have been implemented as a result, Axios reported.

On Thursday, Anthropic unveiled the Claude 4 Opus, which the company said could operate autonomously for hours without losing steam. The level three ranking, the first time the company has given such a score, came after testing revealed a series of concerning behaviors.

During internal testing, the Opus 4 was given access to fictitious emails concerning its inventors and told that the system would be replaced. To avoid being replaced, the AI bot attempted to blackmail the engineer multiple times about an affair indicated in the emails, according to reports.

Axios reported that an outside group, Apollo Research, found that an early version of Opus 4 could scheme and deceive more than any other model it had investigated, and recommended that version not be released, both internally and externally. "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself, all in an effort to undermine its developers' intentions," Apollo Research said in a safety report.

Jan Leike, a former OpenAI executive who heads Anthropic's safety measures, told the paper that the behaviors exhibited by Opus 4 are exactly why substantial safety testing is necessary. "What's becoming more and more obvious is that this work is needed. As models get more capable, they also gain the capabilities they would need to be deceptive or to do more bad stuff," he said.

CEO Dario Amodei said at Thursday's seminar that testing the models won't be effective once AI becomes powerful enough to threaten humanity, warning about life-threatening capabilities. However, he said that AI has not reached "that threshold yet."
ADVERTISEMENT
ADVERTISEMENT
Sign in to comment

Comments

Powered by The Post Millennial CMS™ Comments

Join and support independent free thinkers!

We’re independent and can’t be cancelled. The establishment media is increasingly dedicated to divisive cancel culture, corporate wokeism, and political correctness, all while covering up corruption from the corridors of power. The need for fact-based journalism and thoughtful analysis has never been greater. When you support The Post Millennial, you support freedom of the press at a time when it's under direct attack. Join the ranks of independent, free thinkers by supporting us today for as little as $1.

Support The Post Millennial

Remind me next month

To find out what personal data we collect and how we use it, please visit our Privacy Policy

ADVERTISEMENT
ADVERTISEMENT
By signing up you agree to our Terms of Use and Privacy Policy
ADVERTISEMENT
© 2025 The Post Millennial, Privacy Policy | Do Not Sell My Personal Information