Research & Benchmarks - Page 8 of 24
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Google DeepMind’s latest model, dubbed Vision Banana, has just topped two well‑known benchmarks: it outperformed Meta’s SAM 3 on segmentation and eclipsed Depth Anything V3 on metric depth estimation.
The latest milestone for AI‑driven drug discovery arrived quietly on the clinical front: a DeepMind spinoff has moved its first computer‑crafted compounds into human testing.
Building an agent that can act consistently isn’t just about cranking out a clever prompt. It’s about giving that software a place to store what it knows, how it decides, and what it has already done.
Google DeepMind’s latest paper unveils Decoupled DiLoCo, an asynchronous training framework that keeps more than eight‑in‑ten chips busy even when a sizable slice of the hardware drops out.
When you push an AI assistant from a sandbox into real‑world use, the interaction patterns suddenly explode. Users ask questions you never imagined, combine intents, and trigger edge‑case behavior that no test suite covered.
Xiaomi’s latest AI rollout—MiMo‑V2.5‑Pro and its lighter‑weight sibling MiMo‑V2.5—promises the same headline‑grabbing benchmark scores as leading frontier models while slashing the token cost required for inference.
Designing a production‑grade CAMEL multi‑agent system isn’t just about swapping in the latest planning algorithm or tinkering with tool‑use hooks.
Why does the cost of running AI matter beyond headline‑grabbing accuracy numbers?
Why does it matter when a model can openly admit it doesn’t know? While the hype around ever‑larger language models persists, a quieter shift is happening in how those systems are taught to think.
LangSmith is expanding its toolkit for developers who need to measure how well their language‑model agents perform in the wild.
Why does a fake papal statement matter to anyone who builds a website? The episode began when a detection tool flagged a viral warning attributed to the Pope as AI‑generated, sparking a debate about how often machines masquerade as human voices...
Sergey Brin has put his weight behind DeepMind’s bid to close the gap with Anthropic’s Claude, signaling a strategic shift for the Google‑backed lab.
Fortnite is giving creators a shortcut that feels almost like a cheat code for storytelling.
Why does a model that skips traditional training matter? While most tabular learners spend minutes—or even hours—building trees, TabPFN leans on in‑context learning, essentially treating the dataset as a prompt.
The tutorial walks you through building a Darcy‑flow surrogate with NVIDIA’s PhysicsNeMo library. It stitches together Fourier neural operators (FNOs) and physics‑informed neural networks (PINNs) into a single, reproducible pipeline.
OpenAI’s latest release marks a distinct turn for the company, which has spent most of its public life building general‑purpose chatbots.
Why does the cost balance matter when you’re actually using a model? Companies pour billions into training massive language models, yet the bill doesn’t stop there.
Why does a new Codex add‑on matter to bench scientists? While AI assistants have been sprouting across tech circles, few have been packaged specifically for the nitty‑gritty of life‑sciences work.
OpenAI’s latest foray into the life‑science arena arrives as a tightly scoped model named GPT‑Rosalind, rolled out on a limited‑access basis alongside an expanded Codex plugin on GitHub.
A new Stanford analysis paints a stark picture of the frontier AI field. In production, one‑third of model deployments stumble, and the very tools used to gauge progress are slipping out of reach.
Learn to build AI-powered apps without coding. Our comprehensive review of No Code MBA's course.
Curated collection of AI tools, courses, and frameworks to accelerate your AI journey.
Get the week's most important AI news delivered to your inbox every week.