Papers
arxiv:2606.08671

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

Published on Jun 23
· Submitted by
Zhiwei Li
on Jul 1
Authors:

Abstract

SkillHone enables continuous evolution of agent skills by maintaining persistent decision histories and incorporating practice feedback for improved performance across research and tool-mediated analysis tasks.

Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need to interpret prior revisions, evaluations, and rejected alternatives. We introduce SkillHone, a harness for continual agent skill evolution grounded in persistent decision history. SkillHone pairs skill revisions with evaluation-side evidence that supplies practice feedback, recording structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes with redacted reporting and propose revisions informed by prior decisions, enabling cross-session refinement without rediscovering past rationale. On deep-research benchmarks, SkillHone runs without a pre-integrated search stack and outperforms the commercially backed deep-research agent by 15.8 points on GAIA and 3.2 points on WebWalkerQA-EN, while also exceeding prior skill-evolution methods. We further deploy SkillHone on internal tool-mediated analysis scenarios, where it improves accuracy by an average of 18.8 points across seven settings.

Community

Paper author Paper submitter

🚀 Excited to share SkillHone, a harness for continual agent skill evolution through persistent decision history.

The core idea is simple: agent skills should not only keep the final optimized artifact, but also preserve the decision history behind each revision — diagnoses, rejected alternatives, evaluation evidence, and outcomes. This allows later agents to continue improving a skill across sessions instead of rediscovering the same failures. 🧠

In our implementation, SkillHone uses role-separated optimization/evaluation agents and redacted practice feedback to evolve portable skills. On deep-research benchmarks, SkillHone improves over prior skill-evolution methods and performs strongly in raw open-web settings without relying on a pre-integrated search stack. 🔁

Links:
📄 arXiv: https://arxiv.org/abs/2606.08671
🌐 Project page: https://zwlijay.github.io/SkillHone-Project
🛠️ Skills: https://github.com/Tencent/SkillHone

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.08671
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.08671 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.08671 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.08671 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.