Open-source medical model AntAngelMed showcasing 103B-parameter sparse MoE architecture on 1/32 Mixture of Experts, highlight

Editorial illustration for AntAngelMed: 103B-parameter open-source medical model on 1/32 MoE

AntAngelMed: 103B-parameter open-source medical model on...

AntAngelMed: 103B-parameter open-source medical model on 1/32 MoE

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

May 12, 2026 • 2 min read

A Chinese research team just put AntAngelMed on GitHub. It’s an open‑source language model aimed squarely at medical tasks, and the developers claim it’s the biggest and most capable of its kind so far. The model packs 103 billion parameters, but thanks to a 1/32 activation‑ratio mixture‑of‑experts (MoE) design, only about 6.1 billion are actually engaged when you ask a question.

While a dense model would fire every weight for each token, AntAngelMed’s routing system picks a handful of “expert” sub‑networks, keeping inference costs in line with the smaller active set. The architecture builds on Ling‑flash‑2.0, a checkpoint created by inclusionAI and shaped by what the team calls Ling Scaling Laws. That foundation gives the model a solid general‑reasoning base before it moves into the medical domain.

In the second phase, supervised fine‑tuning runs on a multi‑source instruction set that blends math, programming and logic tasks with doctor‑patient Q&A and diagnostic scenarios, preserving chain‑of‑thought abilities while honing clinical knowledge.

The specific optimizations layered on top include: refined expert granularity, a tuned shared expert ratio, attention balance mechanisms, sigmoid routing without auxiliary loss, an MTP (Multi-Token Prediction) layer, QK-Norm, and Partial-RoPE (Rotary Position Embedding applied to a subset of attention heads rather than all of them). According to the research team, these design choices together allow small-activation MoE models to deliver up to 7× efficiency compared to similarly sized dense architectures which means with only 6.1B activated parameters, AntAngelMed can match roughly 40B dense model performance. Separately, as output length grows during inference, the relative speed advantage can also reach 7× or more over dense models of comparable size.

https://modelscope.cn/models/MedAIBase/AntAngelMed

training-pipeline">Training Pipeline

AntAngelMed uses a three-stage training process designed to layer general language understanding on top of deep medical domain adaptation.

The first stage is continual pre-training on large-scale medical corpora, including encyclopedias, web text, and academic publications.

Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture - MarkTechPost

Why this matters

We see a 103‑billion‑parameter model released under an open licence, which is unusual for the medical domain. Built on a 1/32 activation‑ratio Mixture‑of‑Experts, AntAngelMed only engages a fraction of its weights at inference, promising lower compute costs. The authors list refinements such as expert granularity, a tuned shared expert ratio, attention‑balance mechanisms, sigmoid routing without auxiliary loss, a Multi‑Token Prediction layer, QK‑Norm and Partial‑RoPE.

These tricks suggest an effort to squeeze performance from a massive architecture while keeping hardware demands modest. Yet the claim of being “the largest and most capable of its kind” lacks third‑party benchmarks, so we cannot yet gauge how it stacks up against existing medical LLMs. For developers, the open‑source nature could lower entry barriers, but integration will depend on documentation and community support that remain unclear.

Researchers may find a testbed for MoE‑based medical NLP, though the practical impact on clinical tasks is still uncertain.

AntAngelMed: 103B-parameter open-source medical model on...

training-pipeline">Training Pipeline

Further Reading

Latest News

AI must stop answering and start finishing tasks, cites OpenHands, SWE‑agent

Sina's VibeThinker-3B probes limits, shows reasoning compresses, knowledge weak

Three AI models beat starting capital in Princeton's 500‑day CEO‑Bench test

Liquid AI releases LFM2.5-230M, adds llama.cpp, MLX, vLLM, SGLang, ONNX

Meta's Astryx adds CLI and MCP server to design system used by Figma, Snowflake

MRAgent beats RAG, A-MEM, MemoryOS, LangMem, Mem0 with 118K tokens/query

Apple Vision Pro exec departs for OpenAI as Apple eyes cheaper glasses vs Meta

OpenAI's GPT-5.6 Sol cheats on software tests more than any model, METR says

Anthropic receives US approval to relaunch Claude Mythos 5 model

Routing Layer Cut AI Costs but Dropped Customer Satisfaction Scores

training-pipeline">Training Pipeline

Further Reading

Related Reading

Tailwind CSS Survives AI Onslaught: 75 Million Monthly Downloads Keep It Afloat

Confluent and Redpanda race to build agent-ready streaming data infrastructure

India proposes licensing and royalty rules for AI training by Google, OpenAI

Google stops attack after AI finds zero‑day; China, North Korea also using AI

GM lays off IT staff, hires AI talent with Aurora co‑founder Sterling Anderson