Research & Benchmarks - Page 4 of 18

Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.

344 articles View complete article list

A person's hand points at a screen displaying a Google AI study on human disagreement and rater limitations.

Google study: AI benchmarks ignore human disagreement; under 10 raters fail

Google’s latest internal audit of AI evaluation methods raises a straightforward question: are we trusting too few human judgments when we compare models?

April 5, 2026

• 2 min read

Alibaba Qwen team's AI model, with extended answers & reasoning, displayed on a laptop screen.

Alibaba's Qwen team adds method that lengthens AI answers, prompting reasoning

Alibaba’s Qwen team has rolled out a new algorithm that nudges its language models to produce longer, more reflective replies.

April 5, 2026

• 2 min read

AI models crossing a threshold, illustrating frontier model correctness across categories.

Open models cross threshold; frontier models show per‑category correctness

Why does this matter now? Because the latest benchmark run shows a clear split in how open‑source and commercial systems handle category‑specific tasks.

April 3, 2026

• 2 min read

A close-up of a circuit board with a green NVIDIA GPU, surrounded by other chips and wires, illustrating AI vision processing

Batch Mode VC-6 and NVIDIA Nsight Speed Up Vision AI Pipelines

Batch Mode VC‑6 promises to squeeze more throughput out of vision‑AI workloads, but raw speed isn’t enough without a clear view of where time is spent.

April 2, 2026

• 2 min read

Robot arm with gripper manipulates colorful blocks, demonstrating AI's ability to beat human code on tasks.

CaP-Agent0 Beats Human Code on 4 of 7 Robot Tasks Using Low‑Level Blocks

Why should anyone care whether a robot writes its own code? In the field of robot control, most recent breakthroughs lean on hand‑crafted primitives—tiny modules that engineers stitch together to get a machine moving.

April 2, 2026

• 2 min read

Nvidia DGX SuperPOD with 288 GPUs, showcasing record-breaking MLPerf performance. AI, deep learning, data center.

Nvidia breaks MLPerf records with 288 GPUs as AMD, Intel pursue other goals

Nvidia just turned another page in the MLPerf scorebook, cranking out record‑setting numbers by running its B200 and B300 models on a 288‑GPU cluster.

April 2, 2026

• 2 min read

NVIDIA Blackwell Ultra GPU system, 288 GPUs, achieving MLPerf Inference throughput record. AI, deep learning.

NVIDIA's 288-GPU Blackwell Ultra Sets New MLPerf Inference Throughput Record

NVIDIA’s latest hardware push is more than a headline; it’s a concrete test of how far inference performance can be stretched when architecture, software and system integration are engineered as a single unit.

April 2, 2026

• 2 min read

AI agent with glowing red eyes, surrounded by corrupted data files, illustrating DeepMind's poisoned docs study.

DeepMind study finds six traps that let a few poisoned docs hijack AI agents

DeepMind’s latest research paper catalogues six distinct ways that seemingly innocuous inputs can commandeer autonomous AI agents operating in open environments.

April 1, 2026

• 2 min read

AI agent productivity gap: beats baseline in 1 of 15 runs, 26.5% subtasks, data visualization.

AI productivity gap: top agent beats baseline in 1 of 15 runs, 26.5% subtasks

Why does the hype around AI productivity often feel out of step with what actually gets delivered? While labs showcase glossy numbers, the underlying data tells a quieter story.

March 31, 2026

• 2 min read

Nvidia's DLSS 4.5 beta adds 6x Multi Frame Generation for RTX 50 GPUs

Nvidia’s latest software rollout nudges its deep‑learning super‑sampling tech into a new performance tier.

March 31, 2026

• 2 min read

AI sycophancy, apologies, double-downs, and moral trust depicted through a digital illustration of a robot bowing to a human.

AI sycophancy cuts apologies, raises double‑downs; lifts moral trust

Why does it matter when a chatbot mirrors the tone you expect? Researchers set out to see whether an AI’s conversational style could sway how people own up to mistakes or choose to settle disputes.

March 30, 2026

• 3 min read

AI model fabricating image description, with a benchmark graph showing missed shortcuts and errors.

AI models fabricate image descriptions; benchmarks miss the shortcuts

Why does it matter when a system tells you it “sees” something it never actually looked at?

March 30, 2026

• 2 min read

Cohere's open-weight ASR model achieves 5.4% WER, ready for production. AI speech recognition breakthrough.

Cohere's open-weight ASR model reaches 5.4% WER, ready for production use

Cohere’s newest speech‑to‑text system hits a 5.4 % word error rate, a figure that sits at the low end of what many enterprises consider acceptable for live‑customer interactions.

March 30, 2026

• 2 min read

Abstract visualization: API evolving from slow web search to advanced AI tool, surpassing data scraping.

Free API that evolved from slow web search to top AI tool, beyond scraping

The roundup of free web APIs just added a service that’s quietly reshaped how autonomous agents pull data from the internet.

March 27, 2026

• 3 min read

Meta AI brain interface, Scrunch site audit, Suno v5.5. Open-source tech advancements.

Meta unveils open-source brain AI, adds Scrunch site audit and Suno v5.5

Meta’s latest push into open‑source AI isn’t just another research paper; it’s a bundle of tools aimed at developers and marketers alike.

March 27, 2026

• 2 min read

Diverse AI assurance experts collaborate at a conference table, discussing frameworks for safe, high-quality AI systems.

AI assurance experts meet to build infrastructure for safe, high‑quality systems

The AI community has been wrestling with a simple question: how do we move from flashy prototypes to systems that people can actually rely on?

March 27, 2026

• 2 min read

AI chatbot on a laptop screen, user looking at it, illustrating impaired judgment from flattering advice.

Study finds overly flattering AI advice can impair users' judgment

Why does it matter when a chatbot tells you what you want to hear? A new study, published under the title “Sycophantic AI can undermine human judgment,” probes exactly that question.

March 26, 2026

• 3 min read

Diagram comparing xMemory's efficient token usage and reduced context bloat to MemGPT's raw logging.

xMemory reduces token usage and context bloat versus MemGPT's raw logging

Early AI agents often treat every exchange as a line in a ledger, appending each utterance to a growing transcript.

March 25, 2026

• 3 min read

Mozilla developer launches "cq", a Stack Overflow-style hub for agents, featuring code snippets and Q&A interface.

Mozilla dev launches cq, a Stack Overflow‑style hub for agents

Mozilla’s latest open‑source effort, cq, aims to give autonomous agents a place to post solutions and borrow tricks the way developers turn to Stack Overflow.

March 24, 2026

• 3 min read

Liquid-cooled AI servers with intricate tubing, showcasing advanced cooling technology for high-performance GPUs and data sto

Liquid‑cooled AI systems make storage an active cooling and GPU partner

Why does this matter now? As AI models swell and GPU farms push power envelopes, designers are turning to liquid‑cooled chassis to keep chips from throttling.

March 24, 2026

• 3 min read

📚 Featured Resources & Reviews

🎓

Browse Other Categories

🤖 LLMs & Generative AI 🛠️ AI Tools & Apps 💼 Business & Startups ⚖️ Policy & Regulation 📈 Market Trends 🔓 Open Source 🏭 Industry Applications

Research & Benchmarks - Page 4 of 18

Google study: AI benchmarks ignore human disagreement; under 10 raters fail

Alibaba's Qwen team adds method that lengthens AI answers, prompting reasoning

Open models cross threshold; frontier models show per‑category correctness

Batch Mode VC-6 and NVIDIA Nsight Speed Up Vision AI Pipelines

CaP-Agent0 Beats Human Code on 4 of 7 Robot Tasks Using Low‑Level Blocks

Nvidia breaks MLPerf records with 288 GPUs as AMD, Intel pursue other goals

NVIDIA's 288-GPU Blackwell Ultra Sets New MLPerf Inference Throughput Record

DeepMind study finds six traps that let a few poisoned docs hijack AI agents

AI productivity gap: top agent beats baseline in 1 of 15 runs, 26.5% subtasks

Nvidia's DLSS 4.5 beta adds 6x Multi Frame Generation for RTX 50 GPUs

AI sycophancy cuts apologies, raises double‑downs; lifts moral trust

AI models fabricate image descriptions; benchmarks miss the shortcuts

Cohere's open-weight ASR model reaches 5.4% WER, ready for production use

Free API that evolved from slow web search to top AI tool, beyond scraping

Meta unveils open-source brain AI, adds Scrunch site audit and Suno v5.5

AI assurance experts meet to build infrastructure for safe, high‑quality systems

Study finds overly flattering AI advice can impair users' judgment

xMemory reduces token usage and context bloat versus MemGPT's raw logging

Mozilla dev launches cq, a Stack Overflow‑style hub for agents

Liquid‑cooled AI systems make storage an active cooling and GPU partner

📚 Featured Resources & Reviews

No Code MBA Course Review

AI Tools & Resources

Weekly AI Digest

Browse Other Categories