Editorial illustration for DeepReinforce releases Ornith-1.0 open-source model with state‑of‑the‑art results
DeepReinforce releases Ornith-1.0 open-source model with...
DeepReinforce releases Ornith-1.0 open-source model with state‑of‑the‑art results
DeepReinforce has just put Ornith‑1.0 on the table, an open‑source family of coding agents that claims to think differently about how a model interacts with its own prompt scaffold. The lineup runs from a 9 billion‑parameter dense model up to a 397 billion‑parameter mixture‑of‑experts flagship, all released under an MIT license on Hugging Face. Built on top of pretrained Gemma 4 and Qwen 3.5, each checkpoint comes with FP8 and GGUF builds for faster local serving.
What sets Ornith‑1.0 apart is that, instead of relying on a fixed, human‑crafted harness, the model learns to write its own during reinforcement learning, jointly optimizing the scaffold and the code solution. The research team notes that this approach yields state‑of‑the‑art results among open models of comparable size. To keep the system honest, three guardrails—a fixed trust boundary, a deterministic monitor, and a frozen LLM judge—stand between the model and its reward signal, aiming to curb reward‑hacking.
In short, Ornith‑1.0 is positioned as a reasoning‑first coding model, opening each reply with a `
The DeepReinforce research team reports state-of-the-art results among open models of comparable size.
TL;DR
- Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5.
- The model learns its own scaffold during RL, jointly optimizing the harness and the solution.
- Ornith-1.0-397B tops Claude Opus 4.7 on both headline benchmarks, but not Opus 4.8 or the larger GLM-5.2-744B.
- Three layers -- fixed trust boundary, deterministic monitor, frozen LLM judge -- guard against reward hacking.
What is Ornith-1.0?
Ornith-1.0 is a set of reasoning models tuned for coding agents.
Why this matters
Ornith‑1.0 adds four new open‑source models to the coding‑assistant space, ranging from a 9 B dense checkpoint to a 397 B mixture‑of‑experts flagship, all under an MIT licence on Hugging Face. The family builds on Gemma 4 and Qwen 3.5, and DeepReinforce’s team claims state‑of‑the‑art results among open models of comparable size. For developers, the permissive licence means we can experiment without legal friction, and the variety of scales lets teams match compute budgets to project needs.
Founders may see a drop‑in option for internal tooling, yet the claim of “state‑of‑the‑art” performance is limited to the authors’ own benchmarks; it is unclear whether those results hold across diverse real‑world codebases. Researchers gain a publicly available MoE at 397 B, which could serve as a testbed for agentic coding experiments, but the post‑training on proprietary bases raises questions about reproducibility. In short, Ornith‑1.0 expands the open‑source coding model toolkit, but its practical impact will depend on broader validation and community uptake.
Further Reading
- Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding - DeepReinforce
- Open-source AI just raised the bar for coding agents. Ornith-1.0 delivers impressive benchmark results - X (Twitter)
- 9 Open-Source AI Coding Agents Worth Self-Hosting in 2026 - SSOJet
- Best Open-Source AI Coding Agents in 2026 - Agentic.ai
- Deep Learning for Code in the Agentic Era - NeurIPS 2026