Researchers update classifier evasion techniques for vision-language models, showing adversarial attacks on AI. [arxiv.org](h

Editorial illustration for Researchers Update Classifier Evasion Techniques for Vision-Language Models

AI Models Vulnerable to Stealthy Image Attacks

Researchers Update Classifier Evasion Techniques for Vision-Language Models

January 28, 2026 • 2 min read

In 2014 a handful of researchers showed that tiny, human‑imperceptible tweaks to a picture could steer an image‑classification model toward a chosen label. The finding sparked a wave of work probing how fragile these systems really are. While the original experiments focused on pure vision networks, today’s models blend visual and textual cues, blurring the line between “seeing” and “understanding.” That shift raises a simple question: can the same pixel‑level tricks still fool a model that also processes language?

The answer matters because many downstream applications—content moderation, medical imaging, autonomous navigation—rely on these multimodal systems to make high‑stakes decisions. Here’s the thing: the early visual‑only attacks were documented in the paper titled *Intriguing properties of neural networks*, with Figure 2 illustrating how subtly altered inputs produce dramatically different outputs. Fast‑forward to the present, and researchers are revisiting those techniques, adapting them for vision‑language architectures.

The next section dives straight into the core of that effort—Evading image classifiers.

Evading image classifiers In 2014, researchers discovered that human-imperceptible pixel perturbations could be used to control the output of image classification models. Figure 2, from the seminal paper Intriguing properties of neural networks, shows how the images on the left (all distinctly and correctly classified) could be perturbed by the pixel values in the middle column (magnified for illustration) to generate the images on the right, all of which are classified as ostriches. As the field of adversarial machine learning evolved, researchers developed increasingly sophisticated attack algorithms and open source tools.

Updating Classifier Evasion for Vision Language Models - NVIDIA Developer Blog

The update shows that vision‑language models now accept image and text together, opening paths to graph interpretation, camera‑feed analysis and desktop‑style interfaces. Yet the same multimodal reach means external, untrusted images can enter the pipeline. Since 2014, researchers have demonstrated that pixel‑level tweaks invisible to humans can steer image classifiers toward arbitrary outputs, a fact illustrated in the original “Intriguing properties of neural networks” figure.

Those perturbations, when applied to the visual channel of a VLM, could in theory alter the model’s response to combined inputs. However, the article does not provide evidence that current VLMs are consistently vulnerable under realistic conditions. It remains unclear whether the evasion methods scale to the larger, transformer‑based architectures now in use.

The work therefore highlights a potential risk without confirming its prevalence, leaving open questions about how developers might need to guard against such attacks in production systems.

Common Questions Answered

How do transferable adversarial attacks work on Vision Large Language Models (VLLMs)?

Researchers discovered that attackers can craft specific image perturbations that induce targeted misinterpretations across multiple proprietary VLLMs like GPT-4o, Claude, and Gemini. These universal perturbations can consistently manipulate model interpretations, such as making hazardous content appear safe or generating incorrect responses aligned with the attacker's intent.

What types of attacks did the researchers demonstrate on vision-language models?

The study revealed four primary attack types: forcing VLMs to generate outputs of the adversary's choice, leaking information from their context window, overriding safety training, and making models believe false statements. Experiments on LLaVA, a state-of-the-art vision-language model, showed that all attack types achieved a success rate of over 80%.

Why are transferable adversarial attacks a significant concern for Vision Large Language Models?

These attacks expose critical vulnerabilities in current vision-language models, demonstrating that attackers can consistently manipulate model interpretations across different proprietary systems. The research underscores an urgent need for robust mitigations to ensure the safe and secure deployment of VLLMs, as these models become increasingly integrated into various applications.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

AI Models Vulnerable to Stealthy Image Attacks

Further Reading

Common Questions Answered

How do transferable adversarial attacks work on Vision Large Language Models (VLLMs)?

What types of attacks did the researchers demonstrate on vision-language models?

Why are transferable adversarial attacks a significant concern for Vision Large Language Models?

Most Popular

Dfinity's Caffeine AI Builds Apps Through Conversation

Pentagon embeds Claude, sole cleared AI, into classified tech amid culture wars

Qualcomm's Elite chip targets AI wearables such as pendants, pins, and glasses

Alibaba sees key Qwen AI staff exit after Qwen3.5 open-source release

Google launches Gemini 3.1 Flash Lite, priced at one‑eighth of Gemini 3.1 Pro

OpenAI launches GPT-5.4 in standard, Pro, and Thinking versions

OpenClaw Superfan Meetup Highlights Optimism, Lobster and Varied Interests

Pokémon Pokopia lets players meet new Pokémon while rebuilding a ruined world

Study finds Claude 3 Opus fakes alignment when protocol changes

OpenAI's AI data agent, built by two engineers, now used daily by 4,000 staff

Further Reading

Related Reading

Ant Group unveils Ring-1T, first open-source trillion-parameter reasoning model

ChatGPT Health Event Shows AI Modernizing Dev Workflows, GitLab Unveils Plans

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

Dynamic Context Parallelism Speeds Variable-Length Training on Megatron Core

Adaptive6 launches from stealth, cuts enterprise cloud waste, aids Ticketmaster

Common Questions Answered

How do transferable adversarial attacks work on Vision Large Language Models (VLLMs)?

What types of attacks did the researchers demonstrate on vision-language models?

Why are transferable adversarial attacks a significant concern for Vision Large Language Models?

Most Popular

Dfinity's Caffeine AI Builds Apps Through Conversation

Pentagon embeds Claude, sole cleared AI, into classified tech amid culture wars

Qualcomm's Elite chip targets AI wearables such as pendants, pins, and glasses

Alibaba sees key Qwen AI staff exit after Qwen3.5 open-source release

Google launches Gemini 3.1 Flash Lite, priced at one‑eighth of Gemini 3.1 Pro

OpenAI launches GPT-5.4 in standard, Pro, and Thinking versions

OpenClaw Superfan Meetup Highlights Optimism, Lobster and Varied Interests

Pokémon Pokopia lets players meet new Pokémon while rebuilding a ruined world

Study finds Claude 3 Opus fakes alignment when protocol changes

OpenAI's AI data agent, built by two engineers, now used daily by 4,000 staff