Tag
English articles and guides tagged Agent Skills.
If an eval harness can only tell you the success or failure of a task, but cannot explain whether the agent called the correct capabilities, in what environment it was executed, why it failed, and why it succeeded, then what it gives is not a systematic judgment, but just a score card. This article is based on LangChain's discussion of skills eval and extends my complete understanding of artifact-based scoring, invocation metrics, trace design, workflow eval and evaluation histology.