Gemini Deep Research agent posts top results on HLE, DeepSearchQA, leads BrowseComp
Why does a research‑oriented AI model matter now? Companies and scholars alike have been wrestling with the cost of generating thorough, citation‑rich reports, especially when the underlying benchmarks demand both depth and speed. Gemini’s latest Deep Research agent steps into that gap, aiming to balance performance with affordability.
While earlier versions excelled at specific tasks, this iteration was built with a broader suite of evaluations in mind—ranging from the notoriously tough Humanity’s Last Exam (HLE) to the query‑driven DeepSearchQA and the multi‑modal BrowseComp challenge. The engineering team focused on trimming inference expenses without sacrificing the nuance required for academic‑level output. As a result, users can expect a tool that not only tackles complex question sets but also does so at a fraction of the usual compute budget.
The upcoming rollout promises a more capable, cost‑effective research assistant, positioning Gemini to address a long‑standing pain point for data‑heavy projects.
The new Gemini Deep Research agent achieves state-of-the-art results on Humanity's Last Exam (HLE) and DeepSearchQA, and is our best on BrowseComp. It is optimized to generate well-researched reports at much lower cost. Deep Research is now more useful and intelligent than ever, and will soon be available in Google Search, NotebookLM, Google Finance and upgraded in the Gemini App.
Gemini Deep Research achieves state-of-the-art 46.4% on the full Humanity's Last Exam (HLE) set, 66.1% on DeepSearchQA and a high 59.2% on BrowseComp DeepSearchQA: a benchmark for deep research agents Existing benchmarks often fail to capture the complexity of real-world, multi-step web research. This is why we are open-sourcing DeepSearchQA, a new benchmark to evaluate agents on intricate, multi-step information-seeking tasks.
Will developers adopt it? The new Gemini Deep Research agent, now reachable through the Interactions API, promises to embed Google’s most advanced autonomous research capabilities into external applications. It's claimed state‑of‑the‑art performance on Humanity’s Last Exam and the newly released DeepSearchQA benchmark, and it leads BrowseComp.
Optimized for long‑running context gathering, the agent is said to produce well‑researched reports at a lower cost. The open‑source DeepSearchQA benchmark offers a way to measure comprehensiveness on web‑research tasks, which could help validate those claims. Yet, it remains unclear how the lower‑cost promise translates across diverse workloads, and whether the reported superiority on benchmarks will hold in real‑world deployments.
The announcement notes that Deep Research is “more useful and intelligent than ever,” and that broader availability is forthcoming. Without independent verification, the practical impact is still uncertain. For now, the agent represents a notable step in making sophisticated research tools accessible to developers, pending further testing and adoption data.
Further Reading
- Build with Gemini Deep Research - Google Blog
- Google Releases More Powerful Gemini Deep Research Agent - Reuters
- Google Gemini Deep Research: Complete Guide 2025 - Digital Applied
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents - GitHub / arXiv-linked project
- DeepResearch Bench Leaderboard: Evaluating Deep Research Agents - Hugging Face
Common Questions Answered
What state‑of‑the‑art score does the Gemini Deep Research agent achieve on Humanity's Last Exam (HLE)?
The Gemini Deep Research agent attains a 46.4% accuracy on the full Humanity's Last Exam (HLE) benchmark, which is reported as state‑of‑the‑art performance. This score surpasses previous Gemini iterations and demonstrates its ability to handle complex, citation‑rich tasks.
How does the Gemini Deep Research agent perform on the DeepSearchQA benchmark compared to previous models?
On the newly released DeepSearchQA benchmark, the Gemini Deep Research agent achieves top results, outperforming earlier Gemini versions and other competitors. Its superior performance reflects improvements in long‑running context gathering and cost‑effective report generation.
In which Google products will the Gemini Deep Research agent be integrated soon?
Google plans to roll out the Gemini Deep Research agent across several services, including Google Search, NotebookLM, Google Finance, and an upgraded Gemini App. These integrations aim to provide users with more intelligent, well‑researched outputs directly within familiar platforms.
What developer access is provided for the Gemini Deep Research agent and what capabilities does it enable?
Developers can reach the Gemini Deep Research agent through the Interactions API, allowing external applications to embed Google’s most advanced autonomous research features. This access supports long‑running context gathering, lower‑cost report generation, and state‑of‑the‑art performance on benchmarks like HLE, DeepSearchQA, and BrowseComp.