Illustration for: IndQA Targets India's Billion Non‑English Users, 2nd‑Largest ChatGPT Market
Research & Benchmarks

IndQA Targets India's Billion Non‑English Users, 2nd‑Largest ChatGPT Market

3 min read

When you open a chat in Hindi, Tamil or Bengali, the AI often sounds like it’s guessing. That’s the gap IndQA tries to fill. Built by the same team that launched ChatGPT, the model isn’t just another multilingual test - it’s aimed at a market that already sits second in global ChatGPT usage.

India’s internet users still browse mostly in languages other than English, and the country juggles 22 official tongues, with at least seven spoken by over 50 million people each. By zeroing in on non-English interactions, the project could change how millions talk to AI. The rollout feels like a concrete step toward real accessibility, matching product work with the reality of a billion-strong user base.

In short, it shows a push to serve Indian users where they actually converse.

India has roughly a billion people who don’t use English as their primary language, 22 official languages (including at least seven with over 50 million speakers), and it’s ChatGPT’s second-largest market. This work is part of our ongoing effort to improve products and tools for Indian users.

India has about a billion people who don't use English as their primary language, 22 official languages (including at least seven with over 50 million speakers), and is ChatGPT's second largest market. This work is part of our ongoing commitment to improve our products and tools for Indian users, and to make our technology more accessible throughout the country. IndQA evaluates knowledge and reasoning about Indian culture and everyday life in Indian languages.

It spans 2,278 questions across 12 languages and 10 cultural domains, created in partnership with 261 domain experts from across India. Unlike existing benchmarks like MMMLU and MGSM, it is designed to probe culturally nuanced, reasoning-heavy tasks that existing evaluations struggle to capture. IndQA covers a broad range of culturally relevant topics, such as Architecture & Design, Arts & Culture, Everyday Life, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation--with items written natively in Bengali, English, Hindi, Hinglish, Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil.

Note: We specifically added Hinglish given the prevalence of code-switching in conversations. Each datapoint includes a culturally grounded prompt in an Indian language, an English translation for auditability, rubric criteria for grading, and an ideal answer that reflects expert expectations.

Related Topics: #IndQA #ChatGPT #India #AI #non‑English #multilingual #benchmark #MMMLU #MGSM

Will IndQA actually narrow the gap? OpenAI says the project is meant to steer AGI toward genuine multilingual usefulness. They point out that about 80 percent of people worldwide speak something other than English - a blind spot that still haunts most test suites.

Benchmarks like MMMLU already feel maxed out; top models hover near the ceiling, so spotting real gains gets tricky. India, with roughly a billion non-English primary speakers, 22 official languages and at least seven tongues topping 50 million users, looks like a natural proving ground - and it’s ChatGPT’s second-largest market. The team frames IndQA as a step toward better products for those users.

What’s missing, though, is a clear picture of how the new metrics will differ from MMMLU or fix its flaws, and there’s little on how any improvements will be checked in real-world use. So the idea sounds hopeful, but whether it will turn into tangible benefits for Indian users remains uncertain.

Common Questions Answered

What is IndQA and why was it created?

IndQA is a new benchmark developed by the team behind ChatGPT that evaluates knowledge and reasoning about Indian culture and everyday life in Indian languages. It was created to address the large gap of roughly one billion Indian internet users who primarily use non‑English languages.

How many official languages does India have and how many of them have over 50 million native speakers?

India has 22 official languages, and at least seven of those languages each have more than 50 million native speakers. This linguistic diversity underpins the need for multilingual AI tools like IndQA.

Why does the article describe India as ChatGPT's second‑largest market?

The article notes that India ranks second globally in ChatGPT usage, driven by its massive online population and high engagement despite most users preferring non‑English languages. This makes India a critical market for expanding multilingual AI capabilities.

What shortcomings of existing benchmarks such as MMMLU does IndQA aim to overcome?

Existing benchmarks like MMMLU are described as saturated, with top models achieving near‑ceiling scores that make further progress difficult to measure. IndQA seeks to provide a more challenging and culturally relevant evaluation for Indian languages, helping to gauge true multilingual usefulness.