Skip to main content
Open-source medical model AntAngelMed showcasing 103B-parameter sparse MoE architecture on 1/32 Mixture of Experts, highlight

Editorial illustration for AntAngelMed: 103B-parameter open-source medical model on 1/32 MoE

AntAngelMed: 103B-parameter open-source medical model on...

AntAngelMed: 103B-parameter open-source medical model on 1/32 MoE

2 min read

A Chinese research team just put AntAngelMed on GitHub. It’s an open‑source language model aimed squarely at medical tasks, and the developers claim it’s the biggest and most capable of its kind so far. The model packs 103 billion parameters, but thanks to a 1/32 activation‑ratio mixture‑of‑experts (MoE) design, only about 6.1 billion are actually engaged when you ask a question.

While a dense model would fire every weight for each token, AntAngelMed’s routing system picks a handful of “expert” sub‑networks, keeping inference costs in line with the smaller active set. The architecture builds on Ling‑flash‑2.0, a checkpoint created by inclusionAI and shaped by what the team calls Ling Scaling Laws. That foundation gives the model a solid general‑reasoning base before it moves into the medical domain.

In the second phase, supervised fine‑tuning runs on a multi‑source instruction set that blends math, programming and logic tasks with doctor‑patient Q&A and diagnostic scenarios, preserving chain‑of‑thought abilities while honing clinical knowledge.

The specific optimizations layered on top include: refined expert granularity, a tuned shared expert ratio, attention balance mechanisms, sigmoid routing without auxiliary loss, an MTP (Multi-Token Prediction) layer, QK-Norm, and Partial-RoPE (Rotary Position Embedding applied to a subset of attention heads rather than all of them). According to the research team, these design choices together allow small-activation MoE models to deliver up to 7× efficiency compared to similarly sized dense architectures which means with only 6.1B activated parameters, AntAngelMed can match roughly 40B dense model performance. Separately, as output length grows during inference, the relative speed advantage can also reach 7× or more over dense models of comparable size.

https://modelscope.cn/models/MedAIBase/AntAngelMed

training-pipeline">Training Pipeline

AntAngelMed uses a three-stage training process designed to layer general language understanding on top of deep medical domain adaptation.

The first stage is continual pre-training on large-scale medical corpora, including encyclopedias, web text, and academic publications.

Why this matters

We see a 103‑billion‑parameter model released under an open licence, which is unusual for the medical domain. Built on a 1/32 activation‑ratio Mixture‑of‑Experts, AntAngelMed only engages a fraction of its weights at inference, promising lower compute costs. The authors list refinements such as expert granularity, a tuned shared expert ratio, attention‑balance mechanisms, sigmoid routing without auxiliary loss, a Multi‑Token Prediction layer, QK‑Norm and Partial‑RoPE.

These tricks suggest an effort to squeeze performance from a massive architecture while keeping hardware demands modest. Yet the claim of being “the largest and most capable of its kind” lacks third‑party benchmarks, so we cannot yet gauge how it stacks up against existing medical LLMs. For developers, the open‑source nature could lower entry barriers, but integration will depend on documentation and community support that remain unclear.

Researchers may find a testbed for MoE‑based medical NLP, though the practical impact on clinical tasks is still uncertain.

Further Reading