Research & Benchmarks - Page 2 of 16
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Meta’s Superintelligence Labs has just put its first multimodal model into the wild, a step that nudges the company from pure research toward usable AI systems.
Why does a modest tweak to a benchmark suite matter? While the core idea of Better‑Harness—guiding models through a hill‑climbing routine—has been around, the latest release reshapes how researchers actually apply it.
Researchers have sifted through almost 2.8 million posts from 16 Telegram groups and channels that operate in Italy and Spain, looking for patterns in how automated accounts are discussed.
Google’s AI Overviews feature has been under scrutiny since an earlier analysis flagged it as wrong roughly one‑in‑ten times.
MaxToki AI entered the spotlight with a claim that reads like a lab notebook entry: it can forecast how individual cells grow older and suggest interventions.
Meta’s internal AI leaderboard has turned a routine metric into an internal competition. Engineers earn points by the number of tokens their models consume, and the board updates in real time.
MassMutual and Mass General Brigham have spent the past year wrestling with a familiar problem: dozens of isolated AI experiments that never left the sandbox.
OpenAI’s safety team has been thinning out at a pace that surprised insiders. Over a dozen engineers and researchers have left in the past month, many citing discomfort with the company’s expanding role in defense contracts.
OpenAI’s latest research brief sketches a future where artificial‑intelligence systems shoulder more of the routine workload, freeing human staff to focus on higher‑value tasks.
Why does it matter when a chatbot simply agrees with you? The new study titled “Sycophantic AI chatbots can break even ideal rational thinkers, researchers formally prove” tackles that question head‑on.
Why does this matter? More than half of the adults surveyed by Quinnipiac University—51 percent of 1,397 respondents—report turning to AI tools for research, a jump from 37 percent just a year earlier. Yet confidence isn’t keeping pace.
The new research puts a spotlight on a growing unease among software engineers. While AI‑generated code promises faster feature delivery, the study finds that many developers are hitting a wall of low‑quality output—what the authors dub “AI slop.”...
Google’s latest internal audit of AI evaluation methods raises a straightforward question: are we trusting too few human judgments when we compare models?
Alibaba’s Qwen team has rolled out a new algorithm that nudges its language models to produce longer, more reflective replies.
Why does this matter now? Because the latest benchmark run shows a clear split in how open‑source and commercial systems handle category‑specific tasks.
Batch Mode VC‑6 promises to squeeze more throughput out of vision‑AI workloads, but raw speed isn’t enough without a clear view of where time is spent.
Why should anyone care whether a robot writes its own code? In the field of robot control, most recent breakthroughs lean on hand‑crafted primitives—tiny modules that engineers stitch together to get a machine moving.
Nvidia just turned another page in the MLPerf scorebook, cranking out record‑setting numbers by running its B200 and B300 models on a 288‑GPU cluster.
NVIDIA’s latest hardware push is more than a headline; it’s a concrete test of how far inference performance can be stretched when architecture, software and system integration are engineered as a single unit.
DeepMind’s latest research paper catalogues six distinct ways that seemingly innocuous inputs can commandeer autonomous AI agents operating in open environments.
Learn to build AI-powered apps without coding. Our comprehensive review of No Code MBA's course.
Curated collection of AI tools, courses, and frameworks to accelerate your AI journey.
Get the week's most important AI news delivered to your inbox every week.