Today’s safety guardrails won’t catch these backdoors, study warns

Analysis  AI biz Anthropic has published research showing that large language models (LLMs) can be subverted in a way that safety training doesn’t currently address.…