Illustration for: G42 unveils open-source Hindi-English NANDA 87B, built on Llama-3.1 70B MBZUAI
Open Source

G42 unveils open-source Hindi-English NANDA 87B, built on Llama-3.1 70B MBZUAI

2 min read

Why does a new Hindi‑English model matter in a field dominated by English‑only releases? G42’s latest offering, NANDA 87B, arrives as an open‑source project aimed at narrowing that gap. While the tech stack leans on a well‑known large language model, the effort behind it is distinctly regional.

Researchers at the Mohamed bin Zayed University of Artificial Intelligence teamed up with Inception—a G42 subsidiary—and hardware partner Cerebras to bring the system to life. The focus isn’t just on size; the training data leans heavily on Hindi, with a tokeniser tuned for the language’s nuances. Over 65 billion Hindi tokens feed the model, a scale that suggests a serious attempt to boost performance for native speakers.

In practice, that could mean more accurate translations, better question answering, and tools that respect local linguistic patterns. The following details spell out exactly how the collaboration and architecture converge to address those goals.

Advertisement

The model has been developed by Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in collaboration with Inception, a G42 company, and chipmaker Cerebras. Built on Llama-3.1 70B, NANDA 87B has been trained on more than 65 billion Hindi tokens using a Hindi-centric tokeniser to improve efficiency in training and inference. "India deserves world-class technology that speaks its language. NANDA 87B is a major step in that direction," said Manu Jain, chief executive of G42 India, adding that the model is intended to support innovation across education, entertainment and enterprise use cases in India's AI ecosystem.

Related Topics: #G42 #NANDA 87B #Llama-3.1 #MBZUAI #Cerebras #Hindi-English #Inception #open-source

Will developers adopt it widely? The NANDA 87B model arrives as an open‑weight resource on MBZUAI’s Hugging Face page, inviting creators, developers and businesses to experiment. Built on Llama‑3.1 70B, it expands the earlier NANDA offering with 87 billion parameters and a Hindi‑centric tokeniser trained on more than 65 billion Hindi tokens.

Collaboration between Mohamed bin Zayed University of Artificial Intelligence, G42’s Inception unit and chipmaker Cerebras produced the model, suggesting a concerted effort to support Hindi‑English language tasks. Yet the practical performance of the model on real‑world applications hasn't been disclosed, leaving its comparative strengths unclear. The open‑source nature may lower entry barriers, but without benchmark results it is difficult to gauge how it stacks against existing multilingual models.

Its accessibility could spur community‑driven improvements, though the extent of such contributions remains uncertain. In short, NANDA 87B represents a notable addition to the pool of large language models with a specific linguistic focus, but its impact will depend on subsequent testing and adoption.

Further Reading

Common Questions Answered

What is the size and architecture of the NANDA 87B model?

NANDA 87B has 87 billion parameters and is built on the Llama‑3.1 70B architecture. It expands the earlier NANDA offering while retaining the underlying Llama‑3.1 foundation.

How many Hindi tokens were used to train NANDA 87B and why is a Hindi‑centric tokeniser important?

The model was trained on more than 65 billion Hindi tokens using a Hindi‑centric tokeniser. This specialized tokeniser improves training efficiency and inference performance for Hindi‑English tasks.

Which organizations collaborated to develop NANDA 87B and what roles did they play?

The development involved Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), G42’s Inception unit, and chipmaker Cerebras. MBZUAI led the research, Inception provided project coordination, and Cerebras supplied the hardware platform.

Where can developers access the open‑weight NANDA 87B model and what is its intended use?

Developers can download the model from MBZUAI’s Hugging Face page as an open‑source resource. It is intended for creators, developers, and businesses to experiment with Hindi‑English language applications.

Advertisement