Illustration for: Google Gemini's Deep Think tops ARC-AGI-2 benchmark; Nvidia announces new open
Open Source

Google Gemini's Deep Think tops ARC-AGI-2 benchmark; Nvidia announces new open

2 min read

Why does this matter now? The latest AI roundup—Last Week in AI #328—lists a crowded field: DeepSeek 3.2, Mistral 3, Trainium3 and Runway Gen‑4.5 all made headlines. Yet two moves stand out for anyone watching how research‑grade tools become publicly usable.

While Google’s Gemini suite has been expanding its feature set, a new mode called Deep Think is limited to AI Ultra subscribers inside the Gemini app. In parallel, Nvidia has pushed a suite of open‑source models and tooling aimed at autonomous‑driving research, accompanied by a freshly published paper. The juxtaposition is striking: a high‑performance, subscriber‑only reasoning engine on one side, and a push toward openly available driving‑AI resources on the other.

Both signal where the industry is betting its next advances—complex problem solving and real‑world vehicle intelligence. The details that follow clarify exactly how Deep Think performed on the ARC‑AGI‑2 benchmark and what Nvidia’s open offerings entail.

Advertisement

Available only to Google AI Ultra subscribers in the Gemini app, Deep Think mode tops the ARC-AGI-2 reasoning benchmark and targets complex math, science, and logic problems. Nvidia announces new open AI models and tools for autonomous driving research. Alongside the releases, Nvidia published a Cosmos Cookbook on GitHub with guides, inference resources, and workflows to help developers curate data, generate synthetic data, and fine-tune Cosmos-based models for autonomous driving research.

Black Forest Labs launches Flux.2 AI image models to challenge Nano Banana Pro and Midjourney. It's a new image generation and editing system complete with four different models designed to support production-grade creative workflows.

Related Topics: #Google Gemini #Deep Think #ARC-AGI-2 #Nvidia #autonomous driving #Cosmos Cookbook #Flux.2 #Black Forest Labs #DeepSeek 3.2

DeepSeek’s two new models, V3.2 and the temporarily‑available Speciale, are now on Hugging Face and already running in the company’s app, web and API layers. The open‑source focus suggests a push toward broader community testing, yet the impact of the “reasoning‑first” label remains to be measured against other offerings. Meanwhile, Google’s Gemini app reserves its Deep Think mode for AI Ultra subscribers; the feature topped the ARC‑AGI‑2 benchmark and is aimed at complex math, science and logic problems.

Because access is limited, it is unclear whether the performance lead will translate into wider market relevance. Nvidia’s announcement of new open AI models and tools for autonomous‑driving research adds another piece to the puzzle, accompanied by a recent Cos publication, though details on the models’ capabilities or rollout strategy are sparse. The concurrent releases highlight a busy week for AI development, but the practical significance of these advances—especially given subscription constraints and the nascent state of Nvidia’s tools—remains uncertain.

Further Reading

Common Questions Answered

What benchmark does Google Gemini's Deep Think mode top, and what types of problems is it designed to solve?

Deep Think mode tops the ARC-AGI-2 reasoning benchmark, demonstrating superior performance on complex math, science, and logic problems. The mode is specifically engineered to handle intricate analytical tasks that require advanced reasoning capabilities.

Who can access the Deep Think mode in the Gemini app, and how is it positioned within Google's subscription tiers?

Deep Think mode is available exclusively to Google AI Ultra subscribers within the Gemini app. This restriction places the feature behind Google's highest‑level subscription, targeting power users who need cutting‑edge reasoning tools.

What resources did Nvidia release to support autonomous driving research, and where can developers find them?

Nvidia published a Cosmos Cookbook on GitHub, offering guides, inference resources, and workflows for data curation, synthetic data generation, and fine‑tuning Cosmos‑based models. These open‑source tools are intended to accelerate autonomous driving research by providing a comprehensive development pipeline.

Which new DeepSeek models were highlighted in the article, and how are they being made available to the community?

DeepSeek introduced two models, V3.2 and the temporarily‑available Speciale, which are now hosted on Hugging Face. They are already integrated into DeepSeek's app, web interface, and API layers, reflecting an open‑source push for broader community testing.

Advertisement