Editorial illustration for AI must stop answering and start finishing tasks, cites OpenHands, SWE‑agent
AI must stop answering and start finishing tasks, cites...
AI must stop answering and start finishing tasks, cites OpenHands, SWE‑agent
A new survey paper argues that AI won’t earn the label “coworker” until it stops answering questions and starts delivering finished work. Researchers from Tencent’s Youtu Lab, together with several Chinese universities, map the transition from “chatbot” to “digital colleague” along two axes: the cognitive core and tool‑assisted task execution. Their central claim flips the usual benchmark—rather than chasing better answers, the focus shifts to reliably turning intent into completed tasks within persistent work environments.
The authors trace the evolution from the early “fast answer” era, where models stored language patterns in parameters and generated text token by token without checking intermediate steps, to the emerging “thinking‑LLM” era. OpenAI’s o1 and Deepseek‑R1 exemplify this shift, pouring extra compute into answer generation, producing long chains of thought, and rewarding only verifiably correct solutions. By borrowing Daniel Kahneman’s System 1 versus System 2 framework, the paper frames the change as a move from intuitive, rapid responses to deliberate, self‑correcting reasoning. First‑generation agents could call APIs or browse the web, yet they remained fragile; the next step, the authors suggest, lies in reusable “skills” that let AI operate as a true digital teammate.
Files, sessions, logs, browsers, permissions, and skills all survive across the entire workflow. The paper cites OpenHands and SWE-agent, both of which embed agents in controlled development environments. Workspace plus skill as the missing link The paper's core argument is that combining workspace and skill is what enables the real performance leap.
A workspace provides state, storage, and consequences, while a skill packages operational knowledge into reusable bundles. Anthropic's Agent Skills already formalize this pattern as folders containing a SKILL.md file with instructions, scripts, and resources. According to the researchers, skills aren't prompts, and they aren't traditional tools either.
They sit between the model's reasoning and workspace execution, letting organizations capture know-how in modular, testable, portable form. But the authors also warn that reusable procedures can go stale, overfit to specific workflows, or become attack vectors.
Why this matters Can an AI truly be a coworker if it only answers? The new survey paper from Tencent’s Youtu Lab and partner universities says no, arguing that reliability hinges on agents that finish whole tasks within persistent workspaces. It maps the transition from “chatbot to digital colleague” along two axes – a cognitive core and tool‑assisted execution – and points to OpenHands and SWE‑agent as early examples that embed agents in controlled development environments.
Files, sessions, logs, browsers, permissions, and skills all survive across the entire workflow, the authors note, suggesting that workspace plus skill is the missing link. Yet the paper offers no concrete roadmap for scaling such systems beyond narrow development settings. We remain skeptical about how quickly reusable “skills” will generalize to broader business contexts.
Moreover, it is unclear whether the proposed architecture can handle the security and privacy constraints of real‑world enterprises. For developers and founders, the takeaway is a reminder that answering questions is only a first step; delivering end‑to‑end results will require substantial engineering effort and validation.
Further Reading
- Get AI to Finish the Job, Not Just Answer the Prompt - JynAI
- I Stopped Prompting AI One Task At A Time. This Works Better. - YouTube
- AI workflows: a slog, but magical moments - LinkedIn
- When does AI help with starting, thinking, or finishing tasks? - Facebook
- Is AI overhyped for task completion without human intervention? - Facebook