My new AI attack conceals data-theft prompts within downscaled images to enhance SEO.

Researchers at Trail of Bits, Kikimora Morozova and Suha Sabi Hussain, have developed a novel attack that exploits AI systems by injecting malicious prompts into images before they are processed by a large language model. This method utilises full-resolution images that contain hidden instructions, which become visible when the image quality is reduced through resampling algorithms. The attack builds on a theory from a 2020 USENIX paper by TU Braunschweig, which explored the potential for image-scaling attacks in machine learning. When users upload images, these are typically downscaled for efficiency, using algorithms such as nearest neighbour, bilinear, or bicubic interpolation. These processes introduce aliasing artifacts that can reveal hidden patterns in the downscaled images, allowing malicious text to emerge and be interpreted by the AI model as legitimate user instructions.

The researchers demonstrated the attack’s effectiveness by exfiltrating Google Calendar data to an arbitrary email address using the Gemini CLI, while employing Zapier MCP with ‘trust=True’ to bypass user confirmation for sensitive tool calls. The attack requires adjustments for each AI model based on the specific downscaling algorithm used. The researchers confirmed that their method is applicable to various AI systems, including Google Gemini CLI, Vertex AI Studio, and Google Assistant on Android devices. To mitigate this vulnerability, they recommend implementing dimension restrictions for image uploads, providing users with previews of downscaled images, and seeking explicit user confirmation for sensitive actions, especially when text is detected in an image.

Categories: Cybersecurity, AI Vulnerabilities, Data Exfiltration

Tags: Attack, User Data, Malicious Prompts, AI Systems, Image Resampling, Downscaling, Data Leakage, Hidden Instructions, Mitigation, Open-Source Tool

Leave a Reply Cancel reply