UC San Diego researchers gather around a sleek NVIDIA DGX B200 server, monitoring code on glowing monitors.

Editorial illustration for UC San Diego Lab Taps NVIDIA DGX B200 for Real-Time Language AI Research

UC San Diego Slashes AI Response Time with NVIDIA DGX B200

UC San Diego Lab Uses NVIDIA DGX B200 to Pursue Low-Latency LLM Serving

December 17, 2025 • Updated: January 13, 2026 • 3 min read

Language AI researchers at UC San Diego are pushing the boundaries of how quickly large language models can respond. The Hao AI Labs has turned to NVIDIA's latest hardware to tackle one of generative AI's most stubborn challenges: reducing the lag time between a user's query and an AI's answer.

Their target? Near-instantaneous AI interactions that feel more like natural conversation than waiting for a computational response. By using the NVIDIA DGX B200's powerful specifications, the research team aims to shrink processing delays that currently make AI interactions feel mechanical and slow.

This pursuit isn't just about speed. It's about creating AI systems that can think and respond with the fluidity of human communication. The DGX B200 represents a potential breakthrough, offering computational muscle that could transform how we interact with artificial intelligence.

So how exactly are they approaching this complex technical challenge? The researchers have some intriguing insights about to unfold.

Other ongoing projects at Hao AI Labs explore new ways to achieve low-latency LLM serving, pushing large language models toward real-time responsiveness. "Our current research uses the DGX B200 to explore the next frontier of low-latency LLM-serving on the awesome hardware specs the system gives us," said Junda Chen, a doctoral candidate in computer science at UC San Diego. How DistServe Influenced Disaggregated Serving Disaggregated inference is a way to ensure large-scale LLM-serving engines can achieve the optimal aggregate system throughput while maintaining acceptably low latency for user requests.

The benefit of disaggregated inference lies in optimizing what DistServe calls "goodput" instead of "throughput" in the LLM-serving engine. Here's the difference: Throughput is measured by the number of tokens per second that the entire system can generate. Higher throughput means lower cost to generate each token to serve the user.

For a long time, throughput was the only metric used by LLM-serving engines to measure their performance against one another. While throughput measures the aggregate performance of the system, it doesn't directly correlate to the latency that a user perceives. If a user demands lower latency to generate the tokens, the system has to sacrifice throughput.

This natural trade-off between throughput and latency is what led the DistServe team to propose a new metric, "goodput": the measure of throughput while satisfying the user-specified latency objectives, usually called service-level objectives. In other words, goodput represents the overall health of a system while satisfying user experience. DistServe shows that goodput is a much better metric for LLM-serving systems, as it factors in both cost and service quality.

UC San Diego Lab Advances Generative AI Research With NVIDIA DGX B200 System - NVIDIA AI Blog

UC San Diego's AI research is taking an intriguing turn with NVIDIA's latest hardware. Researchers at Hao AI Labs are pushing the boundaries of large language model performance, focusing specifically on reducing response latency.

The team's work centers on making AI more responsive in real-time scenarios. By using the NVIDIA DGX B200's advanced specifications, doctoral candidate Junda Chen and colleagues are exploring new approaches to LLM serving.

Their current research suggests significant potential for near-instantaneous AI interactions. The project aims to transform how large language models process and respond to queries, potentially bridging the gap between computational complexity and user experience.

While the full scope of their research remains unclear, the focus on disaggregated inference hints at sophisticated technical strategies. Chen's enthusiasm about the "awesome hardware specs" indicates they're working at the cutting edge of AI infrastructure.

The work at UC San Diego represents a promising step toward more dynamic, responsive AI systems. Researchers are methodically breaking down barriers that have traditionally limited language model performance.

Common Questions Answered

How is the NVIDIA DGX B200 helping UC San Diego's Hao AI Labs improve language AI response times?

The NVIDIA DGX B200's powerful hardware specifications are enabling researchers to explore new methods for reducing latency in large language model interactions. By leveraging the system's advanced capabilities, the Hao AI Labs team is working to create near-instantaneous AI responses that feel more like natural conversations.

What is the primary research goal of Junda Chen and the Hao AI Labs team?

The research team is focused on pushing large language models toward real-time responsiveness, specifically targeting the reduction of lag time between a user's query and an AI's answer. Their work aims to develop low-latency LLM serving techniques that can create more fluid and immediate AI interactions.

What approach are UC San Diego researchers using to improve AI interaction speeds?

The researchers are exploring disaggregated inference techniques to enhance large-scale LLM serving engines' performance. By utilizing the NVIDIA DGX B200's advanced specifications, they are investigating innovative methods to make AI responses more instantaneous and conversational.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

UC San Diego Slashes AI Response Time with NVIDIA DGX B200

Further Reading

Common Questions Answered

How is the NVIDIA DGX B200 helping UC San Diego's Hao AI Labs improve language AI response times?

What is the primary research goal of Junda Chen and the Hao AI Labs team?

What approach are UC San Diego researchers using to improve AI interaction speeds?

Most Popular

Gemini helps create 7‑day low‑cost meal plan for USD 200 grocery budget

Shared memory adds documented actions for transparent AI orchestration

AI agents launch dedicated social network as GitLab showcases roadmap

Musk’s Grok still offers free image-editing tools that can undress men

OpenClaw launches ‘Moltbook’ social network for its AI agents

AI‑skilled freshers with workflow automation earn 35‑40% more, up to Rs 22 LPA

Enterprises Misjudge RAG Metrics as Freshness Failures Stem from Source Changes

Firefox adds toggle to disable AI features, matching Edge and Chrome

Musk merges SpaceX with xAI and X, cites new AI‑compute satellite plan

AI aids cross‑breeding to curb decline and genetic loss in endangered species

Further Reading

Related Reading

Ant Group unveils Ring-1T, first open-source trillion-parameter reasoning model

ChatGPT Health Event Shows AI Modernizing Dev Workflows, GitLab Unveils Plans

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

Nvidia's NVentures: 21 Deals in 2023 Fuel AI Ecosystem Expansion

NVIDIA Blackwell Wins All MLPerf Training v5.1 Benchmarks with FP4 Accuracy

JPMorgan AI use reaches 50% of staff, driven by connectivity-first architecture

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

OpenUSD and NVIDIA Halos Enhance Robotaxi Safety with Synthetic Data, SimReady

NVIDIA appoints CloudThat as first Indian education services partner

Common Questions Answered

How is the NVIDIA DGX B200 helping UC San Diego's Hao AI Labs improve language AI response times?

What is the primary research goal of Junda Chen and the Hao AI Labs team?

What approach are UC San Diego researchers using to improve AI interaction speeds?

Most Popular

Gemini helps create 7‑day low‑cost meal plan for USD 200 grocery budget

Shared memory adds documented actions for transparent AI orchestration

AI agents launch dedicated social network as GitLab showcases roadmap

Musk’s Grok still offers free image-editing tools that can undress men

OpenClaw launches ‘Moltbook’ social network for its AI agents

AI‑skilled freshers with workflow automation earn 35‑40% more, up to Rs 22 LPA

Enterprises Misjudge RAG Metrics as Freshness Failures Stem from Source Changes

Firefox adds toggle to disable AI features, matching Edge and Chrome

Musk merges SpaceX with xAI and X, cites new AI‑compute satellite plan

AI aids cross‑breeding to curb decline and genetic loss in endangered species