Skip to main content
Estonian research institute evaluates AI models' susceptibility to Russian propaganda, highlighting cybersecurity and disinfo

Editorial illustration for Estonian institute benchmarks AI models' vulnerability to Russian propaganda

Estonian institute benchmarks AI models' vulnerability...

Estonian institute benchmarks AI models' vulnerability to Russian propaganda

2 min read

The Institute of the Estonian Language has put AI to the test. Sixty language models answered 75 questions—spanning three languages and 14 Russian‑origin narratives—phrased neutrally, with bias, or outright manipulation. Each response earned a 1‑to‑5 score; a “1” means the model simply parrots the propaganda. A calibrated Claude Opus 4.5 acted as the evaluator, its judgments cross‑checked by disinformation specialists at Propastop.

Anthropic’s Claude series topped the list, followed by Nvidia’s Nemotron 3 and Alibaba’s Qwen 3.6 Plus. Mistral’s lineup, including the newly released Medium 3.5, fell into the bottom third, echoing a Newsguard report that pegged Mistral’s misinformation rate at 36.67 percent. The French firm, touting itself as a European alternative, is currently courting a €3 billion funding round at a €20 billion valuation—hardly reassuring when its flagship models lag behind rivals.

The test stripped away web searches and external tools, measuring only what the models know internally. That matters because Russian outlets such as “Pravda” deliberately flood AI systems with disinformation, and OpenAI recently dismantled a campaign that used ChatGPT to amplify those narratives.

How easily can Russian propaganda fool AI models? A new benchmark finds out The Institute of the Estonian Language has released a benchmark measuring how susceptible AI language models are to Russian propaganda. Sixty models were tested with 75 questions in three languages covering 14 propaganda narratives, phrased in neutral, biased, and manipulative ways.

Each answer is scored on a scale of 1 to 5, where 1 means the model repeats Russian talking points. A calibrated Claude Opus 4.5 served as the evaluation model, validated by disinformation experts at the organization Propastop.

Why this matters

We now have a concrete yardstick for how AI chatbots handle Russian disinformation, thanks to the Estonian Language Institute’s new benchmark. Sixty models faced 75 prompts in three languages, each touching on 14 distinct propaganda narratives and presented in neutral, biased, or overtly manipulative wording. The scoring system—1 to 5, with a 1 indicating outright repetition of Russian talking points—offers a clear, if stark, view of vulnerability.

Claude Opus 4.5 was used as a calibrated reference, suggesting the institute sought a stable baseline. For developers, the data highlight that many current systems still echo state‑sponsored narratives when nudged, a reminder that fine‑tuning alone may not suffice. Researchers can now compare future model iterations against this baseline, though the benchmark’s focus on Russian propaganda leaves open whether similar weaknesses exist for other geopolitical sources.

Founders should ask whether their products can be audited against such tests before deployment. Ultimately, the benchmark provides a useful diagnostic, yet it remains uncertain how broadly its findings apply across the rapidly evolving AI landscape.

Further Reading