Skip to main content
Google Gemma 4 AI model on a circuit board, highlighting its low memory and near-zero latency for developers.

Editorial illustration for Google releases Gemma 4 under Apache 2.0, noting lower memory, near‑zero latency

Gemma 4: Google's Lean Open-Source AI Model Unleashed

Google releases Gemma 4 under Apache 2.0, noting lower memory, near‑zero latency

2 min read

Google has put its latest Gemma 4 models into the open‑source arena, moving them to an Apache 2.0 licence and promising a tighter fit for everyday machines. The shift follows the earlier Gemma 3 release, which already let developers run large language models on laptops and phones. This time, Google says the new series trims the resource demands even further while pushing inference speed to a point where delays are barely perceptible.

The company frames the upgrade as the most capable option anyone can host locally, positioning it as a practical alternative to cloud‑only offerings. For developers weighing memory footprints, battery drain and responsiveness, those claims matter. Here’s the company’s own wording on why Gemma 4 should feel noticeably different.

Not only do they use less memory and battery than Gemma 3, but Google also touts "near-zero latency" this time around. More powerful, more open All the new Gemma 4 models will reportedly leave Gemma 3 in the dust--Google claims these are the most capable models you can run on your local hardware. Google says Gemma 31B will debut at number three on the Arena list of top open AI models, behind GLM-5 and Kimi 2.5. However, even the biggest Gemma 4 variant is a fraction of the size of those models, making it theoretically much cheaper to run.

Google’s Gemma 4 arrives under an Apache 2.0 licence, a clear shift from the proprietary terms that have governed its Gemini siblings. Four model sizes are now advertised for local deployment, each tuned to consume less memory and battery than the year‑old Gemma 3. Google also touts “near‑zero latency,” positioning the suite as the most capable open‑weight option for on‑device inference.

The licensing change directly addresses developer complaints about restrictive AI terms, offering broader freedom to experiment and integrate. Yet the claim that these models will “leave Gemma 3 in the dust” rests on internal benchmarks; independent verification is still pending. Likewise, the promise of “most capable” performance on local hardware is compelling, but whether the improvements translate across diverse workloads remains unclear.

In practice, developers now have a set of lighter, faster models they can run without cloud dependencies, and the open licence removes a barrier that has frustrated many. Whether the combination of lower resource demands and the new licence will drive wider adoption is something the community will have to observe.

Further Reading

Common Questions Answered

How does Gemma 4 improve upon the previous Gemma 3 models in terms of performance?

Gemma 4 offers significant improvements by reducing memory and battery consumption compared to Gemma 3. Google claims these new models provide near-zero latency and are more powerful, with the Gemma 31B model expected to rank third on the Arena list of top open AI models.

What licensing approach is Google using for the Gemma 4 models?

Google has released Gemma 4 under the Apache 2.0 license, which is a significant departure from the proprietary terms used for its Gemini models. This open licensing approach addresses developer concerns and provides more flexibility for use and deployment of the AI models.

What are the deployment capabilities of the Gemma 4 models?

Google has designed Gemma 4 to be highly deployable on local hardware, with four different model sizes available for on-device inference. The models are specifically optimized to run efficiently on laptops, phones, and other everyday machines with minimal resource requirements.