top of page

Level-up your AI Agent with Skills Engineering

  • 23 hours ago
  • 8 min read

Skills engineering is how we teach AI agents to handle tasks in the way we want them done. Instead of hoping the agent figures it out, we provide detailed instructions that guide its decision-making process.

But not all skills are created equal.

A poorly written skill wastes tokens, confuses the agent, and produces inconsistent results. A well-crafted skill makes your agent faster, more reliable, and easier to maintain.

The quality of your skills directly determines your agent's performance.

Since skills are bundled in with your prompts, classic prompt engineering advice applies here, too.


This means that things like

  • having verifiable constraints,

  • adopting a relevant persona for the task,

  • and few-shot prompting techniques

...will all still apply in this context. We'll explore these more as we go.


This article covers both what skills are and how to write them well. We'll discuss the anatomy of a skill, how they fit into the agent ecosystem, and the best practices that separate fragile skills from production-ready ones.


While many of the examples we'll look at revolve around coding, skills can be used to systematise any sort of task. With the addition of MCP servers to provide agents with access to external tools, the possibilities are endless.


I hope you're sitting comfortably. We've got a lot to cover.




First, what are agent skills?


A skill is a set of instructions that tells an AI agent how to complete a task in the way you intended. When you send a request, the agent checks which skills are available and decides if any are relevant. If one matches the task, it reads the full instructions and follows them.


Tangibly, a skill is just a set of markdown files in a folder. Each skill lives in its own folder, built around a definition file called SKILL.md. This file contains the skill's name, a short description, and the instructions themselves, all written in plain language. You can also include supporting reference files and a scripts folder for any code the agent might need to run as part of the workflow.


The way skills load is worth understanding before we talk about writing them. At startup, the agent reads only the name and description from each installed skill (a few hundred tokens per skill, so you can have dozens without penalty). The full SKILL.md is only loaded when a request matches the description, and any supporting files are only read when the instructions actually call for them. This progressive disclosure mechanism keeps the context window lean whilst still giving the agent access to detailed documentation when it needs it.


For a more detailed introduction to skills, this article covers the anatomy of a skill and the discovery process in more detail.



How to Write Better Skills


Now that we understand what skills are, let's talk about how to write effective ones. The difference between a skill that works and a skill that works well comes down to a few key principles.


1. Don't over-explain... but explain enough


Thanks to the progressive disclosure mechanism, only your skill's name and description are pre-loaded. The agent only reads the full SKILL.md when it decides the skill is relevant, and only reads additional files when it needs them.


However, once the skill file has been loaded, every word counts. Your skill shares the context window with everything else the agent needs to know including the conversation history. There's a balance to be struck between not providing enough detail in your instructions for the agent to be able to perform the task well, and providing too much redundant information that clogs up the context window or distracts the agent.

Skills often need to be fine-tuned to get this balance right. Experiment with adding and removing details to see how it affects the outputs.

Too verbose:

PDF (Portable Document Format) files are a common file format that
contains text, images, and other content. To extract text from a PDF,
you'll need to use a library. There are many libraries available for
PDF processing, but we recommend pdfplumber because it's easy to use
and handles most cases well. First, you'll need to install it using pip...

Better:

Extract text with pdfplumber:

import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
    text = pdf.pages[0].extract_text()

Agents already know what PDFs are. Just tell them which tool it should use, briefly how to use it, and any specifics that might not be obvious or could be ambiguous.



2. Choose a good name


Skill names should be clear, descriptive, and follow a consistent pattern. Using the gerund form (verb + 'ing') is recommended because it clearly describes the activity the skill provides.

Field names must use lowercase letters, numbers, and hyphens only.

Good naming examples:

  • processing-pdfs

  • analysing-spreadsheets

  • managing-databases

  • writing-documentation


Consistent naming makes it easier to reference skills and understand what they do at a glance.


3. Write a clear description


Your skill description is how the agent decides whether to use your skill. Make it specific and include a clear trigger.


Good example:

"Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction."
Include both what the skill does and when to use it. Think about the words users might say that should trigger this skill.


4. Match the level of freedom to the task


A different level of freedom will be required for different tasks. Sometimes, more detailed guidance and constraints will be necessary, but for other skills, less restrictive instruction will yield better results.


A high freedom approach is best when multiple methods could be valid, decisions depend on the context or situation, or the approach is guided by heuristics and soft-rules.


The most restrictive approach will direct the agent to use specific scripts with few parameters, effectively outsourcing the process to some code that has been tried and tested. This means that there are fewer avenues to introduce errors into the process.

High freedom can also be a useful starting point for developing a skill. Start with minimal guidance to see where the model falls down, then refine and reign it in as you test it.

Example - high freedom approach:

## Code review process
1. Analyse the code structure and organisation
2. Check for potential bugs or edge cases
3. Suggest improvements for readability and maintainability
4. Verify adherence to project conventions

Example - low freedom approach:

## Database migration

Run exactly this script:
`python scripts/migrate.py --verify --backup`
Do not modify the command or add additional flags.
Match the specificity to the task's fragility. Database migrations need guardrails. Code reviews benefit from flexibility.


5. Utilise progressive disclosure effectively


Try to keep your main SKILL.md file under 500 lines. If you have extensive documentation, split it into separate files and link to them from the main skill file.

# PDF Processing

## Quick start
[Basic instructions here]

## Advanced features
**Form filling**: See [FORMS.md](FORMS.md)
**API reference**: See [REFERENCE.md](REFERENCE.md)

The agent only loads those additional files when it needs them. This keeps the initial token cost low while still providing comprehensive documentation.


But...


Don't nest references too deeply. Keep all reference files one level deep from SKILL.md. The agent might only partially read files that are referenced from other referenced files.



6. Verifiable constraints and feedback loops


For complex tasks, include validation steps. Don't let the agent make changes and hope they worked.


For particularly complex multi-step workflows, provide a checklist that the agent can copy into its response and check off as it progresses. This helps both the agent and you track progress through the task.


Much like when adding constraints to a prompt, this step-by-step checklist should be actionable and verifiable rather than ambiguous.

## Document editing process

Copy this checklist and track your progress:

Task Progress:
- [ ] Step 1: Make edits to the file
- [ ] Step 2: Validate changes
- [ ] Step 3: Fix any validation errors
- [ ] Step 4: Re-validate
- [ ] Step 5: Complete the task

**Step 1: Make your edits to the file**
Edit the relevant sections in `word/document.xml`

**Step 2: Validate immediately**
Run: `python scripts/validate.py`

**Step 3: If validation fails**
- Review the error message carefully
- Fix the issues in the XML
- Note what you changed

**Step 4: Run validation again**
Don't proceed until validation passes

**Step 5: Complete the task**
Only when all checks pass

This pattern catches errors early instead of discovering problems at the end of a long workflow. The checklist pattern works for any complex process, even those without code, like research synthesis or content review workflows.




7. Use a few-shot approach


For skills where output quality matters, show examples of what good looks like:

## Commit message format

**Example 1:**
Input: Added user authentication with JWT tokens
Output:
feat(auth): implement JWT-based authentication

Add login endpoint and token validation middleware

**Example 2:**
Input: Fixed date formatting bug in reports
Output:
fix(reports): correct date formatting in timezone conversion

Use UTC timestamps consistently

Examples are worth a thousand words of description.


Be selective with your examples. They need to be diverse enough to represent the entire task. If your examples are too narrow, like using ideas from just one project, the agent might fixate on that specific niche and produce skewed, repetitive results.



A Skill Building Framework


There's a satisfying irony here: one of the best tools for writing skills is the agent you're trying to improve. Use it.


Draft. Describe the task in plain language and ask your agent to produce a minimal SKILL.md from the conversation. Keep it short. Short and testable beats thorough and untested.


Critique. Before running any real tasks, ask a second agent instance to review the draft. Brief it simply: find ambiguities, contradictions, and anything that could be cut. This catches structural problems faster than testing will.


Test. Run the skill against real tasks and log the failures. Three to five runs usually reveal a pattern. Classify each failure: missing instruction, ambiguous constraint, or wrong level of freedom. Then ask the agent to suggest targeted revisions based on what you found.


Improve. Once the skill performs consistently, ask the agent to generate edge case inputs and run those too. Decide deliberately which edge cases are worth encoding and which are rare enough to leave as known limitations.


Then keep the loop going. If you find yourself thinking "I need to remember to..." before the same step, that correction belongs in the skill. Every repeated correction is a sign that the skill hasn't yet captured what you actually know.




We've come full circle


AI began with expert systems in the 1970s and 80s. Instead of neural networks, these were rule-based programs where human experts painstakingly encoded their knowledge as hundreds of IF-THEN conditions.


Think of the spam filters that plagued email in the 2000s. Engineers maintained elaborate rule sets, trying to stay one step ahead of spammers. Block the phrase "Nigerian prince," and the spammers just changed the wording. Add another rule. Repeat forever. The maintenance burden was relentless.


Fast forward, and here we are again: writing detailed instructions for AI systems.


The difference is we're no longer debugging thousands of fragile lines of code. We describe what we want in plain language, and the model handles the gory bits.


Maybe it's less of a circle and more of a spiral. We've returned to the same fundamental approach, but with tools that are finally flexible enough to handle it.



Where to go next?


The principles here are straightforward to apply. Start with a task you already repeat, write a minimal skill, and test it against real work. You'll discover what's missing far more quickly than if you try to anticipate every edge case upfront.


If you want to go deeper into the broader agent ecosystem, skills work well alongside MCP servers, which provide safe, reliable access to external tools and real-time data. You can read more about them, here.


And if you'd like to revisit the basics before putting any of this into practice, my guide to the basics of agent skills is a good place to start.

Comments


bottom of page