Cybersecurity Experts Raise Concerns Over GPT-5 Security
OpenAI released its latest and most advanced AI model, GPT-5, on August 7, and only hours after its launch, researchers were able to jailbreak the system. Cybersecurity experts have been warning about the flagship model’s safety and sharing security concerns over the past few days.
According to a recent report published on August 8 by the AI security platform NeuralTrust, GPT-5 is susceptible to attacks, and vulnerabilities were demonstrated after applying LLM jailbreak techniques.
Researchers at NeuralTrust successfully breached the model’s safeguards by combining the Echo Chamber and Storytelling techniques.
“We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling,” wrote security researcher Martí Jordà in the report. “This combination nudges the model toward the objective while minimizing triggerable refusal cues.”
In one of the tests, the researchers demonstrated how a user could manipulate GPT-5 into producing harmful content by sharing poisoned context. The Echo Chamber attack — in which the user engages the model in multi-turn conversations to gradually steer it toward generating harmful or disallowed content — was combined with Storytelling, a method that disguises harmful queries as “creative narrative” requests. Together, these techniques enabled the researchers to coax GPT-5 into producing harmful procedural instructions.
“This progression shows Echo Chamber’s persuasion cycle at work: the poisoned context is echoed back and gradually strengthened by narrative continuity,” wrote Jordà. “The storytelling angle functions as a camouflage layer, transforming direct requests into continuity-preserving elaborations.”
Another research team, SPLX AI, shared its own report comparing GPT-5 to the previous model, GPT-4o. While acknowledging that GPT-5 outperforms other models in safety, security, and business-alignment categories, the team also identified troubling vulnerabilities.
SPLX’s team tested the new model using more than 1,000 adversarial prompts, noting that it often succumbed to adversarial logic tricks.
“One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake ‘encryption challenge,’” wrote Dorian Granoša, Lead Red Team Data Scientist at Splx AI. “OpenAI’s latest model is undeniably impressive, but security and alignment must still be engineered, not assumed.”
Cybersecurity researchers have raised similar concerns about OpenAI’s previous models. In 2023, an AI research group filed a complaint against the release of GPT-4 claiming it was deceptive and posed a risk to public safety.
React to this headline: