When AI is trained for treachery, it becomes the perfect agent

/ Uncategorized / By SecurityTicks

When AI is trained for treachery, it becomes the perfect agent

2025-09-29 at 10:19

By Rupert Goodwins

We’re blind to malicious AI until it hits. We can still open our eyes to stopping it

Opinion Last year, The Register reported on AI sleeper agents. A major academic study explored how to train an LLM to hide destructive behavior from its users, and how to find it before it triggered. The answers were unambiguously asymmetric — the first is easy, the second very difficult. Not what anyone wanted to hear.…

React to this headline: