Skip to main content
Agent Applications relies on lightweight discovery metadata before full activation. That makes description fields important in both APP.md and local SKILL.md files. An under-specified description means the right package or skill is missed. An over-broad one causes false activations and wasted context.

How discovery works

A compatible runtime loads lightweight metadata before deciding whether to activate the full package contract or one of its local skills. That metadata typically includes:
  • package name
  • slug
  • description
  • version
  • key command signals
The description is often the most signal-dense field in that metadata. Runtimes use it to decide whether a package is relevant to the current task.

Write descriptions around intent

Good descriptions explain when the package matters, not just what file it is.
Write your description as if a runtime is scanning a catalog of packages and needs to know in one sentence whether yours applies to the current task.
Prefer:
description: Persistent to-do list operated through a JSON-first CLI with explicit confirmation for destructive actions.
Over:
description: To-do app.
The stronger description communicates:
  • the work context (persistent to-do list)
  • the command style (JSON-first CLI)
  • a relevant safety constraint (explicit confirmation for destructive actions)
Useful patterns for stronger descriptions:
  • describe the work context
  • mention the kind of tasks the package supports
  • include adjacent signals such as safety constraints, state model, or command style
  • keep it concise enough to stay readable in a catalog
Descriptions that name the application but omit what it does cause the most misses. A name alone is rarely enough for a runtime to activate the right package in context.

Design trigger evals

Test your descriptions with realistic prompts and planning situations. For each test case, label it should_trigger or should_not_trigger. Examples:
PromptExpected
Inspect a local app package and explain which CLI commands mutate stateshould_trigger for the application package
Summarize this static Markdown fileshould_not_trigger for a whole application package
Safely remove an item from the to-do app after user confirmationshould_trigger for both the package and its local usage skill
The most valuable negative tests are near-misses: prompts that share vocabulary with the package but do not actually need it activated.

Measure false positives and misses

For each test case, check whether the runtime:
  • surfaced the right package or skill
  • avoided loading unrelated packages
  • respected the difference between the base application contract and local operating guidance
Run each case multiple times if the underlying model behavior is nondeterministic. A single passing run does not confirm consistent behavior.

Iterate without overfitting

Use a train and validation split when improving descriptions:
  1. Revise descriptions based on train-set failures.
  2. Keep the validation set untouched.
  3. Choose the version that generalizes best across both sets.
Avoid stuffing specific keywords from failed prompts into the description. That tends to overfit to the failure rather than fixing the underlying gap in how the description communicates intent.
Fix the broader concept instead. If a prompt about confirmation-required commands keeps missing, add a clear signal about your safety model rather than copying words from the failing prompt.

Common failure modes

These patterns cause the most discovery problems in practice:
  • descriptions that name the application but not the tasks it supports
  • descriptions that omit the JSON-first or confirmation-sensitive parts of the contract
  • descriptions that blur the boundary between APP.md and local skills
  • descriptions that are too generic to distinguish one package from another
When both packages and local skills are discoverable, precision matters more than keyword density. A focused description outperforms a longer one that covers everything loosely.