Research & Benchmarks - Page 2 of 24
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Academic AI research, performance benchmarks, scientific breakthroughs, and peer-reviewed studies advancing artificial intelligence frontiers.
Why does an AI need a safety net? The new arXiv paper, “Strategic Decision Support for AI Agents,” treats that question like a math problem.
London Stock Exchange Group is putting its data muscle behind generative AI. The firm, which serves more than 40,000 customers and 400,000 end‑users across roughly 190 markets, has long used machine learning to power financial models.
Why does this matter? Because setting up a Hermes Agent used to be a series of command‑line steps, each prone to typo‑induced headaches.
Why does this matter now? Researchers have long asked whether AI can pull together evidence from multiple studies and produce a trustworthy summary, especially when health decisions hang in the balance.
Hierarchical language agents often stumble not at the final answer but halfway through, when they choose a path without realizing they’re missing key details. Those blind spots show up as wrong branches that cascade into larger errors.
Foundation‑model agents are no longer fleeting chatbots; they’re long‑lived systems that keep track of users across sessions. That shift turns memorization into a deployment‑time function instead of a hidden byproduct of model weights.
All the code for this section lives on GitHub, tucked away in src/selection/logit_model_selection.py, with the accompanying analysis in 08_logistic_model_selection.qmd.
Converting a quantized checkpoint into an NVIDIA TensorRT engine is the missing link between model‑level optimization and real‑world deployment.
Why does this matter? Companies are turning to AI‑enabled tools not just to automate routine work but to shape strategy itself. While the tech is impressive, the new arXiv preprint 2606.10044v1—titled *Business World Model*—offers a different angle.
Federated learning (FL) research often starts with a deceptively simple question: what should we try next?
Why does this matter? Because the newest wave of large‑language‑model reasoning hinges less on bigger datasets and more on how models handle inference.
AI coding assistants are being tested on a full‑scale fly optogenetics workflow—a data‑to‑discovery pipeline that normally consumes days or months of specialist time.
FIFA rolls out the first match of the 2026 World Cup on Thursday, June 11, at Mexico City’s new stadium, and a data‑driven fan decided to test how far machine learning can go.
Reddit’s r/ChangeMyView recently became the focus of a unique research effort. Unknown external scholars inserted undisclosed, AI‑driven accounts into live debates, prompting a field experiment that was later shut down after ethical concerns were...
We’re deep into developer conference season, and the buzz is unmistakable: Big Tech is convinced AI will rewrite how we work, play and create.
Why does benchmark coverage matter for massive language models? The authors argue we’ve been looking at the wrong slice of performance.
The National Science Foundation has just extended its backing of the MIT‑led Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) for another five years, nudging the annual grant from $4 million to $4.98 million.
The rush to embed AI in writing, design and analysis promised speed, but the reality is messier. While a single prompt could shave hours off a task, today practitioners juggle a growing menu of agents.
Why does this matter? In remote regions, a single forest inventory plot can cost as much as a modern computer used for training a model. The reality is that field measurements are a bottleneck for any spatial prediction task.
Something has shifted at the intersection of AI and data science, and it’s already changing how practitioners work.
Learn to build AI-powered apps without coding. Our comprehensive review of No Code MBA's course.
Curated collection of AI tools, courses, and frameworks to accelerate your AI journey.
Get the week's most important AI news delivered to your inbox every week.