📂 Category
Research & Benchmarks Articles - Complete AI News Archive
344 articles in this category • Page 1 of 4
- 1. NVIDIA BioNeMo wraps CPU layer with DistributedTriangleMultiplication
- 2. Google staff urge Sundar Pichai to reject classified military AI projects
- 3. AI framework autonomously optimizes data, models, algorithms, outperforms humans
- 4. MolClaw Introduces Autonomous Agent for Hierarchical Drug Screening
- 5. Lakehouse concept drives AI data access for thousands of enterprise users
- 6. Fine-tuning RAG embeddings may drop retrieval accuracy 40%, study finds
- 7. AI pipelines show silent failures from orchestration drift, detected weeks later
- 8. OSWorld Benchmark Evaluates LLMs on Real Computer Use, Unlike Text‑Only Tests
- 9. PageIndex Retrieves via Reasoning Using OpenAI gpt-5.4 Model
- 10. Discord Users Access Anthropic's Mythos AI Tool Without Authorization
- 11. Google DeepMind's Vision Banana Outperforms SAM 3 and Depth Anything V3
- 12. DeepMind spinoff’s AI‑designed drugs enter human trials after AlphaFold 3
- 13. COALA paper defines agent memory types: procedural rules and semantic facts
- 14. Google DeepMind's Decoupled DiLoCo hits 88% goodput despite hardware failures
- 15. Agent observability powers production evaluation through trace analysis
- 16. Xiaomi launches MiMo‑V2.5‑Pro and V2.5, matching benchmarks at lower token cost
- 17. Designing Production-Grade CAMEL Multi-Agent Systems: Start with Docs and GitHub
- 18. Multi-agent AI systems incur higher token costs than single agents in practice
- 19. Reinforcement learning trains AI like OpenAI's o1 to admit uncertainty
- 20. LangSmith adds reusable LLM-as-judge and rule-based code evaluator templates
- 21. AI made up over a third of new sites by 2025; Pope warning flagged as AI
- 22. Sergey Brin pushes DeepMind to match Claude, unveils agent skills catalog
- 23. Fortnite adds AI‑powered NPCs for unscripted player conversations
- 24. TabPFN hits 98.8% accuracy in 0.47 s, beating Random Forest and CatBoost
- 25. NVIDIA PhysicsNeMo Tutorial Maps k(x,y) to u(x,y) for Darcy Flow
- 26. OpenAI unveils GPT‑Rosalind, AI model to speed drug discovery and genomics
- 27. Standard LLM guidelines focus on training costs, overlook inference budget
- 28. GPT‑Rosalind life‑sciences plugin for Codex launches on GitHub
- 29. OpenAI launches GPT-Rosalind, hits top score on BixBench benchmark
- 30. Frontier AI models fail one in three production runs, audits grow harder
- 31. Meta researchers unveil hyperagents for self‑improving AI in non‑coding tasks
- 32. Claude outperforms humans on alignment task, but results disappear in production
- 33. Google DeepMind unveils Gemini Robotics‑ER 1.6, beats prior model in tool count
- 34. UK tests Mythos AI, noting its ability to chain multistep attacks
- 35. AI Forum Launches Professional Certificate and USD 120M Fund for AI Fluency
- 36. Databricks finds multi-step agents beat single-turn RAG by 21% to 38% on STaRK
- 37. Stanford AI Index 2026: 53% adopt generative AI in 3 years, education lags
- 38. NVIDIA, UMD release AF-Next audio model, beats Phi-4-mm by 12 points on Arabic
- 39. Developers Claim Measured Drop in Claude's Performance, Sparking Nerf Debate
- 40. Seven AI agents in finance lift cash flow >3% monthly, boost productivity 50%
- 41. Meta AI and KAUST Propose Neural Computers Merging Compute, Memory, I/O
- 42. Prediction drift can mask security model decay despite stable accuracy
- 43. Researchers say OpenAI's Sora and Google's Veo aren't true world models
- 44. TriAttention KV Cache Compression Matches Full Attention, 2.5× Faster
- 45. Knowledge Distillation Keeps Student Model Capacity to Match Ensemble Boundaries
- 46. Google AI's PaperOrchestra boosts manuscript success, 79‑81% win rate
- 47. OSGym runs 1,000+ OS replicas at USD 0.23/day with decentralized state management
- 48. Stanford study finds AI agent handoffs lose information, affecting compute cost
- 49. Meta Superintelligence Labs launches Muse Spark, its first multimodal AI model
- 50. Better Harness updates add usage examples, chaining guide, and tool clarifications
- 51. Study finds ‘bot’ term used 16,232 times in 2.8M Telegram messages
- 52. Google AI Overviews answers 91% of test questions correctly after Gemini 3 update
- 53. MaxToki AI boosts context to 16,384 tokens with RoPE scaling
- 54. Meta staff inflate AI token counts on internal leaderboard, wasting resources
- 55. MassMutual, Mass General Brigham turn AI pilot sprawl into production
- 56. OpenAI safety staff exit as Altman dismisses Pentagon contract concerns
- 57. OpenAI urges firms to fund pensions, health, childcare as AI cuts costs
- 58. Study shows sycophantic AI chatbots can outwit ideal rational users
- 59. Americans use AI more than ever but trust it less, Quinnipiac poll shows
- 60. Study maps developer frustration with AI slop as tragedy of the commons
- 61. Google study: AI benchmarks ignore human disagreement; under 10 raters fail
- 62. Alibaba's Qwen team adds method that lengthens AI answers, prompting reasoning
- 63. Open models cross threshold; frontier models show per‑category correctness
- 64. Batch Mode VC-6 and NVIDIA Nsight Speed Up Vision AI Pipelines
- 65. CaP-Agent0 Beats Human Code on 4 of 7 Robot Tasks Using Low‑Level Blocks
- 66. Nvidia breaks MLPerf records with 288 GPUs as AMD, Intel pursue other goals
- 67. NVIDIA's 288-GPU Blackwell Ultra Sets New MLPerf Inference Throughput Record
- 68. DeepMind study finds six traps that let a few poisoned docs hijack AI agents
- 69. AI productivity gap: top agent beats baseline in 1 of 15 runs, 26.5% subtasks
- 70. Nvidia's DLSS 4.5 beta adds 6x Multi Frame Generation for RTX 50 GPUs
- 71. AI sycophancy cuts apologies, raises double‑downs; lifts moral trust
- 72. AI models fabricate image descriptions; benchmarks miss the shortcuts
- 73. Cohere's open-weight ASR model reaches 5.4% WER, ready for production use
- 74. Free API that evolved from slow web search to top AI tool, beyond scraping
- 75. Meta unveils open-source brain AI, adds Scrunch site audit and Suno v5.5
- 76. AI assurance experts meet to build infrastructure for safe, high‑quality systems
- 77. Study finds overly flattering AI advice can impair users' judgment
- 78. xMemory reduces token usage and context bloat versus MemGPT's raw logging
- 79. Mozilla dev launches cq, a Stack Overflow‑style hub for agents
- 80. Liquid‑cooled AI systems make storage an active cooling and GPU partner
- 81. 10 X Accounts for LLM Updates, Including the ‘Largest AI Newsletter’
- 82. Teens await sentencing for AI‑generated nude images as parents sue school
- 83. Developers say AI‑generated games feel unlike human‑made; audiences don't connect
- 84. Hachette withdraws Shy Girl horror novel amid AI usage concerns
- 85. Scale AI's Voice Showdown ranks Qwen ahead of top models, highlights failures
- 86. SynthID uses steganography to embed hidden watermarks in data
- 87. Google Search experiments with AI-generated headlines, may expand rollout
- 88. Growing cultural disconnect as companies race to deploy AI rapidly
- 89. Deep AI adopters reshape workflow, borrowing product‑manager tactics
- 90. NVIDIA DGX Spark expands node support to four, doubling memory capacity
- 91. Google's MusicFX DJ Enables Real-Time Controllable AI Music Generation
- 92. Paper identifies simple games that defeat AlphaGo and AlphaChess training
- 93. NVIDIA Cosmos Transfer Enables Scalable Synthetic Data for Physical AI
- 94. Trump Administration Signals Possible Additional Sanctions on Anthropic at Hearing
- 95. YouTube extends AI deepfake detection to politicians, journalists
- 96. Karpathy releases open-source Autoresearch, runs hundreds of AI tests nightly
- 97. AI spots trends but misses significance, keeping humans essential
- 98. Large CUDA Tiles Reduce Flash Attention TFLOPS by 18‑43% Across Sequences
- 99. KV cache compaction cuts LLM memory 50×, chunked processing long contexts
- 100. AI system flags probable matches, narrows anonymous accounts to shortlist