Skills: The Loadout System for AI Agents

Published: 2026-03-03 · 7 min

Skills: The Loadout System for AI Agents hero image

Every AI agent has a base capability set — what it can do by default. For most setups, that’s the model itself: read text, reason about it, produce text back. Maybe file access. Maybe web search if the platform enables it.

That’s not enough to run a real operation.

The gap between a capable model and a capable agent is tooling. And the mechanism that bridges that gap — in my stack, at least — is skills.

What a Skill Actually Is

A skill is a SKILL.md file. That’s it. Plain markdown. Plain text instructions that tell the agent: here is a specific tool, here is how to use it correctly, here are the patterns that work, here are the ones that will break. When the agent reads it, it picks up that capability as operational context.

Skills are not plugins. They’re not code that gets executed. They don’t call external APIs directly. They’re instruction sets — structured briefings that make a general-purpose agent more effective in a specific domain.

The analogy I keep coming back to: a skill is the difference between telling a new hire “we have a CRM” and giving them a 15-page onboarding doc on how the CRM is actually used, what fields matter, what workflows the team follows, and what mistakes to avoid. Same system. Completely different output quality.

A skill can cover any domain: - How to use a CLI tool correctly (including which flags to prefer and which to avoid) - How to structure API calls for a specific service - What workflow to follow for a multi-step task - What the common failure modes are and how to recover from them - Platform-specific quirks that the model wouldn’t know from training alone

The General-Purpose Loadout

My agent infrastructure runs with a standard loadout of skills that cover the tools most operations need. Here’s the working set and what each actually does:

1password — Reading secrets without exposing them in shell commands. The skill documents how to use op read for clean secret injection, how to reference items by vault and item name, and the patterns for using 1Password as a secrets manager in scripts and automation runs. Without this, the agent either hardcodes credentials or asks for them manually. Neither is acceptable in an always-on ops setup.

github — The gh CLI is more powerful than most people use it. The skill covers: PR status checks, CI run inspection, issue creation and comment threading, repo-level API queries, and viewing workflow logs. When a build fails at 2 AM, the engineering agent can surface the exact failing step from the CI log without anyone needing to open a browser.

agent-browser — Browser automation for tasks that require actual web interaction. Form submission, data extraction, screenshot capture, navigation flows. The skill documents how to structure automation sequences, what to do when pages load slowly, and how to handle authentication flows. This is what lets the agents do real web work rather than just reasoning about it.

summarize — Extract structured content from URLs, YouTube videos, podcasts, and local files. The skill covers the full workflow: hitting a URL, extracting readable content, running it through a summarization pipeline, and producing a structured output. This is the fallback for any “read this and tell me what’s in it” task regardless of source format.

things-mac — Task management via Things 3. Reading inbox and today list, adding tasks with tags and deadlines, updating existing tasks. Useful for keeping the agent aware of the human side of the work queue without having to manually brief it.

weather — Fetches current conditions and forecasts from wttr.in or Open-Meteo. No API key, no configuration. Useful for scheduling outdoor activities, travel planning, or anything that needs real-world environmental context.

xurl — Authenticated requests to the X (Twitter) API. Post, reply, search, read timeline, upload media. The skill covers OAuth handling, endpoint selection, and how to structure requests for the actions that come up most in content operations. Without this, X interaction is either manual or dependent on a third-party scheduler.

gog — Google Workspace CLI covering Gmail, Calendar, Drive, Contacts, Sheets, and Docs. The skill is extensive because the surface area is large. Most-used: sending email with attachments, reading calendar events, writing to Drive files, and querying Sheets as a data source.

blogwatcher — RSS/Atom feed monitoring. How to set up watches on blogs and feeds, how to receive update notifications, how to diff new vs. seen content. More useful than it sounds for research operations that need to track specific sources without manual checking.

There are more in the loadout, but these are the ones that get used in every active operation.

How to Add a New Skill

Two paths: install from ClawHub, or write your own.

ClawHub is the skill registry for the OpenClaw ecosystem. If a tool has broad enough usage, someone’s already written a SKILL.md for it. Installation is one command:

clawhub install <skill-name>

The skill lands in the skills directory and gets picked up on the next session. No restart required if the session prompt already includes skill scanning.

Custom skills are for tools and workflows that are specific to your operation. The format is straightforward: create a directory at ~/.openclaw/workspace/.agents/skills/<skill-name>/ and drop a SKILL.md inside it. Structure it however is most useful, but the patterns that work:

  1. What the tool does — one paragraph, no fluff
  2. How to invoke it — exact commands with flags
  3. Patterns that work — specific to your environment
  4. Failure modes — what breaks and why
  5. Examples — concrete invocations with expected output

The agent reads the SKILL.md at session start when it determines the skill is relevant. The relevance check is semantic — it’s comparing task descriptions against skill descriptions, not doing exact string matching. A task about “schedule a reminder” will load the things-mac skill. A task about “pull PR status” will load github. Unrelated skills don’t get loaded.

Why Loadout Composition Matters

The skills an agent has access to define what it can do. This sounds obvious but the implications are non-trivial.

An agent without the browser skill cannot interact with websites — full stop. It can reason about web content if you paste it in, but it cannot navigate, fill forms, extract data from dynamic pages, or take screenshots. The browser skill doesn’t just expand capability — it changes the category of work the agent can be dispatched for.

This is why matching skills to role is part of agent design, not agent configuration. My engineering agent has github, agent-browser, 1password, and coding-agent in its loadout. It does not have things-mac, blogwatcher, or xurl. Those are not tools a coding agent needs, and giving it access to them just adds noise to its context and creates paths for scope creep.

My writing specialist has summarize, gog (for reading documents and drafts), and the blog-specific skill I built for the content pipeline. It does not have github. When the writing specialist encounters something that needs a code fix, it should surface that to the engineering agent, not attempt to handle it.

The design principle: every skill in an agent’s loadout should be there because that agent will use it in the normal course of its work. Skills it might theoretically need once should be added at dispatch time, not baked into the base loadout.

Skills as Operational Memory

There’s a second function skills serve that’s less obvious: they capture institutional knowledge about how tools behave in your specific environment.

Generic documentation tells you how a CLI works. Your SKILL.md tells you how it works in your setup — which flags you’ve found useful, what your specific configuration looks like, what the common error states are in your environment, and what the workarounds are.

When I build a new skill, I document what I’ve learned through actually running the tool — what the edge cases are, what the failure modes look like, what the command sequence is for the operations I do most. Six months from now, a new agent session picks up that skill and runs the tool correctly on the first try, without rediscovering any of those lessons.

Skills are how you take tacit knowledge about your infrastructure and make it operationally available to every agent that needs it. That’s the compounding return: write the skill once, benefit from it every time the tool gets used.

The Honest Limitation

Skills are only as good as what’s in them. A skill that says “use the gh CLI for GitHub” without actually documenting the commands, flags, and patterns gives an agent only marginally more than it already had from training data.

The work is in the specificity. What exact commands do you run? What output do you care about? What does success look like? What does failure look like and how do you recover? The more concrete the skill, the more reliably the agent operates within it.

Writing a good SKILL.md takes an hour. It saves that hour every time the tool gets used — and it makes the output quality consistent rather than dependent on whether the model happened to have good training data for that specific tool in that specific configuration.

That’s the trade. Front-load the knowledge. Harvest it in every subsequent operation.


Using a tool that doesn’t have a skill yet? Email me at [email protected] and I’ll tell you whether it’s on the roadmap.

← All Posts

© Ridley Research. All rights reserved.