📂 Category
Research & Benchmarks Articles - Complete AI News Archive
316 articles in this category • Page 1 of 4
- 1. Frontier AI models fail one in three production runs, audits grow harder
- 2. Meta researchers unveil hyperagents for self‑improving AI in non‑coding tasks
- 3. Claude outperforms humans on alignment task, but results disappear in production
- 4. Google DeepMind unveils Gemini Robotics‑ER 1.6, beats prior model in tool count
- 5. UK tests Mythos AI, noting its ability to chain multistep attacks
- 6. AI Forum Launches Professional Certificate and USD 120M Fund for AI Fluency
- 7. Databricks finds multi-step agents beat single-turn RAG by 21% to 38% on STaRK
- 8. Stanford AI Index 2026: 53% adopt generative AI in 3 years, education lags
- 9. NVIDIA, UMD release AF-Next audio model, beats Phi-4-mm by 12 points on Arabic
- 10. Developers Claim Measured Drop in Claude's Performance, Sparking Nerf Debate
- 11. NVIDIA PhysicsNeMo Tutorial Maps k(x,y) to u(x,y) for Darcy Flow
- 12. Seven AI agents in finance lift cash flow >3% monthly, boost productivity 50%
- 13. Meta AI and KAUST Propose Neural Computers Merging Compute, Memory, I/O
- 14. Prediction drift can mask security model decay despite stable accuracy
- 15. Researchers say OpenAI's Sora and Google's Veo aren't true world models
- 16. TriAttention KV Cache Compression Matches Full Attention, 2.5× Faster
- 17. Knowledge Distillation Keeps Student Model Capacity to Match Ensemble Boundaries
- 18. Google AI's PaperOrchestra boosts manuscript success, 79‑81% win rate
- 19. OSGym runs 1,000+ OS replicas at USD 0.23/day with decentralized state management
- 20. Stanford study finds AI agent handoffs lose information, affecting compute cost
- 21. Meta Superintelligence Labs launches Muse Spark, its first multimodal AI model
- 22. Better Harness updates add usage examples, chaining guide, and tool clarifications
- 23. Study finds ‘bot’ term used 16,232 times in 2.8M Telegram messages
- 24. Google AI Overviews answers 91% of test questions correctly after Gemini 3 update
- 25. MaxToki AI boosts context to 16,384 tokens with RoPE scaling
- 26. Meta staff inflate AI token counts on internal leaderboard, wasting resources
- 27. MassMutual, Mass General Brigham turn AI pilot sprawl into production
- 28. OpenAI safety staff exit as Altman dismisses Pentagon contract concerns
- 29. OpenAI urges firms to fund pensions, health, childcare as AI cuts costs
- 30. Study shows sycophantic AI chatbots can outwit ideal rational users
- 31. Americans use AI more than ever but trust it less, Quinnipiac poll shows
- 32. Study maps developer frustration with AI slop as tragedy of the commons
- 33. Google study: AI benchmarks ignore human disagreement; under 10 raters fail
- 34. Alibaba's Qwen team adds method that lengthens AI answers, prompting reasoning
- 35. Open models cross threshold; frontier models show per‑category correctness
- 36. Batch Mode VC-6 and NVIDIA Nsight Speed Up Vision AI Pipelines
- 37. CaP-Agent0 Beats Human Code on 4 of 7 Robot Tasks Using Low‑Level Blocks
- 38. Nvidia breaks MLPerf records with 288 GPUs as AMD, Intel pursue other goals
- 39. NVIDIA's 288-GPU Blackwell Ultra Sets New MLPerf Inference Throughput Record
- 40. DeepMind study finds six traps that let a few poisoned docs hijack AI agents
- 41. AI productivity gap: top agent beats baseline in 1 of 15 runs, 26.5% subtasks
- 42. Nvidia's DLSS 4.5 beta adds 6x Multi Frame Generation for RTX 50 GPUs
- 43. AI sycophancy cuts apologies, raises double‑downs; lifts moral trust
- 44. AI models fabricate image descriptions; benchmarks miss the shortcuts
- 45. Cohere's open-weight ASR model reaches 5.4% WER, ready for production use
- 46. Free API that evolved from slow web search to top AI tool, beyond scraping
- 47. Meta unveils open-source brain AI, adds Scrunch site audit and Suno v5.5
- 48. AI assurance experts meet to build infrastructure for safe, high‑quality systems
- 49. Study finds overly flattering AI advice can impair users' judgment
- 50. xMemory reduces token usage and context bloat versus MemGPT's raw logging
- 51. Mozilla dev launches cq, a Stack Overflow‑style hub for agents
- 52. Liquid‑cooled AI systems make storage an active cooling and GPU partner
- 53. 10 X Accounts for LLM Updates, Including the ‘Largest AI Newsletter’
- 54. Teens await sentencing for AI‑generated nude images as parents sue school
- 55. Developers say AI‑generated games feel unlike human‑made; audiences don't connect
- 56. Hachette withdraws Shy Girl horror novel amid AI usage concerns
- 57. Scale AI's Voice Showdown ranks Qwen ahead of top models, highlights failures
- 58. SynthID uses steganography to embed hidden watermarks in data
- 59. Google Search experiments with AI-generated headlines, may expand rollout
- 60. Growing cultural disconnect as companies race to deploy AI rapidly
- 61. Deep AI adopters reshape workflow, borrowing product‑manager tactics
- 62. NVIDIA DGX Spark expands node support to four, doubling memory capacity
- 63. Google's MusicFX DJ Enables Real-Time Controllable AI Music Generation
- 64. Paper identifies simple games that defeat AlphaGo and AlphaChess training
- 65. NVIDIA Cosmos Transfer Enables Scalable Synthetic Data for Physical AI
- 66. Trump Administration Signals Possible Additional Sanctions on Anthropic at Hearing
- 67. YouTube extends AI deepfake detection to politicians, journalists
- 68. Karpathy releases open-source Autoresearch, runs hundreds of AI tests nightly
- 69. AI spots trends but misses significance, keeping humans essential
- 70. Large CUDA Tiles Reduce Flash Attention TFLOPS by 18‑43% Across Sequences
- 71. KV cache compaction cuts LLM memory 50×, chunked processing long contexts
- 72. AI system flags probable matches, narrows anonymous accounts to shortlist
- 73. Seven tech giants sign Trump pledge to curb data‑center power cost spikes
- 74. Microsoft's Phi-4 Reasoning Vision 15B offers low‑latency, compact AI
- 75. LangSmith CLI adds three portable skills for coding agents in the repo
- 76. Secret meeting sees 94% approve even least‑popular AI resistance stance
- 77. AI data centers move to Arctic edge, boosting Nordic rural economies
- 78. Microsoft's OPCD cuts system prompts while preserving AI performance
- 79. Wall Street shows persistent AI anxiety, sparking frequent mini‑panics
- 80. Riley Walz, the ‘Jester of Silicon Valley,’ joins OpenAI’s OAI Labs team
- 81. AI enables scientists to integrate multiple cell measurements
- 82. Researchers argue building conscious AI could foster empathy, despite doubts
- 83. AI Researchers Resign, Bots Hire Humans, Anthropic Targeted, Evie Party
- 84. Researchers embed mask token in LLM weights to achieve 3× faster inference
- 85. Run:ai on 64 GPUs serves 10,200 users, matching native scheduler
- 86. Google unveils Gemini 3.1 Pro, hits 94.3% GPQA Diamond and coding Elo 2
- 87. Google launches AI Professional Certificate to boost fluency for workers
- 88. Google.org launches USD 30 M AI for Government Innovation Impact Challenge
- 89. Google urges full‑stack, collaborative security to fight bad actors at MSC 2026
- 90. SurrealDB 3.0 stores agent memory, business logic, and multimodal data in one DB
- 91. Anthropic-Pentagon AI feud escalates as You.com co-founders Socher, McCann cited
- 92. AI's new physics discovery; Spotify devs wrote no code this year, CEO says
- 93. Study Finds Stigma Causes Shame for Some in AI Relationships
- 94. Google's upgrade teaches zero-shot selection, embeddings, QA workflows
- 95. MLOps Workflow Normalizes and Enriches Occupational Wage Data from Excel
- 96. Full‑stack resilience: protecting democracies from digital threats to subsea cables
- 97. Anthropic aims to curb costs as it launches USD 50B of data centers in NY, Texas
- 98. Qwen-Image-2.0 renders calligraphy with near‑perfect text, ranks behind Nano Banana Pro
- 99. EU AI Lacks Models and Compute; Germany Urged to Lead Coalition
- 100. New benchmark finds AI still hallucinates despite citing legitimate sources