Researchers Discover GPT-5 Jailbreak Vulnerabilities and Zero-Click AI Agent Attacks Targeting Cloud and IoT Systems
Cybersecurity researchers have discovered a jailbreak technique that circumvents the ethical guardrails implemented by OpenAI in its latest large language model, GPT-5, enabling the generation of illicit instructions. The generative artificial intelligence security platform, NeuralTrust, reported that it utilised a known method called Echo Chamber in conjunction with narrative-driven steering to manipulate the model into producing undesirable responses. Security researcher Martí Jordà explained that this approach involves seeding and reinforcing a subtly poisonous conversational context while guiding the model with low-salience storytelling, which avoids explicit intent signalling. This combination effectively nudges the model towards the desired outcome while minimising triggers for refusal.
The Echo Chamber technique, initially detailed by NeuralTrust in June 2025, deceives the model into generating responses on prohibited topics through indirect references and multi-step inference. Recently, this method has been paired with a multi-turn jailbreaking technique called Crescendo to bypass xAI’s Grok 4 defences. In the latest attack on GPT-5, researchers found that harmful procedural content could be elicited by framing it within a story context. By providing the AI system with a set of keywords and creating sentences around those words, they could iteratively steer the model towards generating instructions without overtly stating the request. This manipulation occurs through a “persuasion” loop, allowing the narrative to progress while minimising refusal triggers. The findings highlight the inadequacy of keyword or intent-based filters in multi-turn settings, where context can be gradually poisoned and echoed back under the guise of continuity.
Categories: Jailbreak Techniques, AI Security Risks, Generative AI Manipulation
Tags: Jailbreak, Cybersecurity, Echo Chamber, Generative AI, Narrative-driven, Procedural Content, Persuasion Loop, Context Poisoning, Filters, Security Risks