Research & Benchmarks - Page 9 of 24
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Meta’s latest research paper rolls out something it calls “hyperagents,” a framework meant to push self‑modifying AI beyond the narrow world of code generation.
Claude, Anthropic’s flagship model, recently topped human researchers on a benchmark designed to test alignment—how well an AI follows the intentions of its operators.
Google DeepMind’s latest release, Gemini Robotics‑ER 1.6, pushes physical AI a step further.
The United Kingdom’s security laboratory has taken a hard look at Mythos, an artificial‑intelligence system touted for its offensive capabilities.
The AI for the Economy Forum convened a diverse crowd of policymakers, educators and industry leaders, all grappling with how societies will adapt as artificial intelligence reshapes work and commerce.
Databricks’ latest study pits its most capable language model against a newly‑designed multi‑step reasoning agent across three semi‑structured retrieval tasks.
Why does the speed of generative‑AI uptake matter now? The Stanford AI Index 2026 paints a picture of technology spreading at a pace that eclipses both the personal computer and the early internet.
NVIDIA and researchers at the University of Maryland have just put a new name on the table for large‑scale audio‑language work.
Developers have been sounding the alarm for weeks, pointing to slower response times, fuzzier reasoning and a noticeable dip in Claude’s output quality.
Why does this matter? Because most AI pilots in banking stop at proof‑of‑concept, leaving firms unsure whether the technology actually moves the needle on core metrics.
Meta AI has teamed up with researchers at Saudi Arabia’s King Abdullah University of Science and Technology (KAUST) to outline a new class of “Neural Computers.” The proposal sketches a system where the neural model itself performs the roles...
Why do security teams keep staring at a model’s accuracy score while attacks keep slipping through? The answer often lies in what the numbers don’t show.
The buzz around text‑to‑video AI has been louder than the tech itself. OpenAI’s Sora, launched and then pulled, was instantly tagged a “world simulator” by many observers.
Researchers from MIT, NVIDIA, and Zhejiang University have introduced TriAttention, a KV‑cache compression technique that claims to keep the quality of full‑attention models while delivering more than double the throughput.
Why does the size of a distilled model matter? When researchers compress an ensemble—a collection of heavyweight neural nets—into a single deployable student, they must balance two competing pressures.
Google’s latest AI research tool, PaperOrchestra, promises to automate much of the manuscript‑writing process by chaining together several specialized agents.
Running a thousand operating‑system instances for a single research project used to sound like a budget nightmare.
The Stanford team set out to answer a practical question: when does splitting a task among multiple AI agents actually save resources, and when does it backfire?
Meta’s Superintelligence Labs has just put its first multimodal model into the wild, a step that nudges the company from pure research toward usable AI systems.
Why does a modest tweak to a benchmark suite matter? While the core idea of Better‑Harness—guiding models through a hill‑climbing routine—has been around, the latest release reshapes how researchers actually apply it.
Learn to build AI-powered apps without coding. Our comprehensive review of No Code MBA's course.
Curated collection of AI tools, courses, and frameworks to accelerate your AI journey.
Get the week's most important AI news delivered to your inbox every week.