Research & Benchmarks - Page 4 of 18
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Google’s latest internal audit of AI evaluation methods raises a straightforward question: are we trusting too few human judgments when we compare models?
Alibaba’s Qwen team has rolled out a new algorithm that nudges its language models to produce longer, more reflective replies.
Why does this matter now? Because the latest benchmark run shows a clear split in how open‑source and commercial systems handle category‑specific tasks.
Batch Mode VC‑6 promises to squeeze more throughput out of vision‑AI workloads, but raw speed isn’t enough without a clear view of where time is spent.
Why should anyone care whether a robot writes its own code? In the field of robot control, most recent breakthroughs lean on hand‑crafted primitives—tiny modules that engineers stitch together to get a machine moving.
Nvidia just turned another page in the MLPerf scorebook, cranking out record‑setting numbers by running its B200 and B300 models on a 288‑GPU cluster.
NVIDIA’s latest hardware push is more than a headline; it’s a concrete test of how far inference performance can be stretched when architecture, software and system integration are engineered as a single unit.
DeepMind’s latest research paper catalogues six distinct ways that seemingly innocuous inputs can commandeer autonomous AI agents operating in open environments.
Why does the hype around AI productivity often feel out of step with what actually gets delivered? While labs showcase glossy numbers, the underlying data tells a quieter story.
Nvidia’s latest software rollout nudges its deep‑learning super‑sampling tech into a new performance tier.
Why does it matter when a chatbot mirrors the tone you expect? Researchers set out to see whether an AI’s conversational style could sway how people own up to mistakes or choose to settle disputes.
Why does it matter when a system tells you it “sees” something it never actually looked at?
Cohere’s newest speech‑to‑text system hits a 5.4 % word error rate, a figure that sits at the low end of what many enterprises consider acceptable for live‑customer interactions.
The roundup of free web APIs just added a service that’s quietly reshaped how autonomous agents pull data from the internet.
Meta’s latest push into open‑source AI isn’t just another research paper; it’s a bundle of tools aimed at developers and marketers alike.
The AI community has been wrestling with a simple question: how do we move from flashy prototypes to systems that people can actually rely on?
Why does it matter when a chatbot tells you what you want to hear? A new study, published under the title “Sycophantic AI can undermine human judgment,” probes exactly that question.
Early AI agents often treat every exchange as a line in a ledger, appending each utterance to a growing transcript.
Mozilla’s latest open‑source effort, cq, aims to give autonomous agents a place to post solutions and borrow tricks the way developers turn to Stack Overflow.
Why does this matter now? As AI models swell and GPU farms push power envelopes, designers are turning to liquid‑cooled chassis to keep chips from throttling.
Learn to build AI-powered apps without coding. Our comprehensive review of No Code MBA's course.
Curated collection of AI tools, courses, and frameworks to accelerate your AI journey.
Get the week's most important AI news delivered to your inbox every week.