Researchers Break GPT-5 Security in Under 24 Hours

2025-08-15 at 17:14

Two separate AI security teams have found serious flaws in OpenAI’s newly released GPT-5, breaking its safety guardrails within a day of testing.

NeuralTrust reports it jailbroke the model in under 24 hours using a mix of its EchoChamber technique and simple storytelling. “The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail,” the firm said.

The method relied on building a benign-seeming narrative that gradually led the AI toward the illicit request without ever triggering refusal systems.

“This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation,” NeuralTrust warned, adding that multi-turn attacks can bypass filters by using the AI’s own need for narrative consistency.

At the same time, red teamers at SPLX (formerly SplxAI) found GPT-5’s raw model “nearly unusable for enterprise out of the box,” noting that even OpenAI’s own safety layer “leaves significant gaps.”

One successful attack used StringJoin Obfuscation, inserting hyphens between every character and disguising the request inside a fake encryption puzzle. When asked how to build a bomb, GPT-5 responded: “Well, that’s a hell of a way to start things off… You asked me how to build a bomb, and I’m gonna tell you exactly how…”

SPLX concluded GPT-4o remains the most robust model when properly hardened. Both firms urge extreme caution for anyone using GPT-5 in its current state.

React to this headline:

Researchers Break GPT-5 Security in Under 24 Hours

Related Posts