How to Test the Functionality of a Defined Skill in Knowledge-Work-Plugins
Testing a defined skill in the anthropics/knowledge-work-plugins repository involves validating trigger phrases against tests/triggers.md, executing end-to-end scenarios using mocked MCP connectors via the Python test harness in plugins/test_runner.py, and asserting that generated artefacts match reference structures.
The anthropics/knowledge-work-plugins repository provides a framework for building AI-assisted skills using YAML front-matter and markdown definitions. To ensure reliability, you must test the functionality of a defined skill before deployment, which involves verifying trigger routing, workflow execution, and external connector interactions without requiring live API credentials.
How Skills Are Structured in the Repository
Each skill resides under a domain folder (e.g., small-business/skills/job-post-builder) and is defined by a YAML front-matter + markdown file named SKILL.md. The front-matter declares the skill name and description, while the body details trigger phrases, the step-by-step workflow, approval gates, and reference files loaded on-demand.
For comprehensive testing, every skill requires two specific test artefacts located in the tests/ subdirectory:
tests/triggers.md: Lists must-trigger, must-NOT-trigger, and ambiguous user utterances to validate routing logic.tests/scenarios.md: Provides end-to-end scenario descriptions covering happy paths, missing connectors, and approval-gate flows.
Both files are referenced from SKILL.md to ensure the test harness can locate them automatically.
Test Harness Architecture
The testing framework operates through three distinct layers implemented in the plugins/ package.
The Three-Layer Testing Model
- Trigger-matching layer: Parses user utterances, loads the skill's trigger regexes derived from
SKILL.mdfront-matter, and asserts the correct skill is selected. - Workflow-execution layer: Runs the skill's phases using mocked MCP connectors (QuickBooks, Gmail, DocuSign, etc.) from the
plugins/mocks/package, replacing real network calls with deterministic stubs. - Result-validation layer: Inspects generated artefacts (files saved via the
docxskill, URLs returned for DocuSign drafts) against expectations defined intests/scenarios.md.
The generic test runner lives in plugins/test_runner.py, while plugins/test_harness.py provides the core SkillRunner and SkillRouter utilities.
Why Mocked MCP Connectors Are Essential
Skills depend on external MCP services like QuickBooks, HubSpot, and DocuSign. Real API calls would require credentials and create flaky CI pipelines. The plugins/mocks/ package supplies in-memory stand-ins that emulate connector contracts defined in reference schema files (e.g., reference/hubspot-fields.md), returning deterministic data for consistent testing.
Writing Tests for a Defined Skill
Testing Trigger Matching
Validate that your skill correctly identifies relevant user utterances by creating tests using the SkillRouter class from plugins/test_harness.py:
# tests/triggers.py (example for job-post-builder)
from plugins.test_harness import SkillRouter
def test_job_post_builder_triggers():
router = SkillRouter()
# Must-trigger utterances
assert router.select_skill("We need to hire a senior product manager") == "job-post-builder"
assert router.select_skill("Write a job post for a data scientist") == "job-post-builder"
# Must-NOT-trigger utterances
assert router.select_skill("What is my cash flow?") != "job-post-builder"
assert router.select_skill("Summarize the last email") != "job-post-builder"
Run these with pytest tests/triggers.py to verify the trigger-matching layer routes requests correctly.
Testing End-to-End Scenarios
For workflow validation, use SkillRunner with MockConnector instances to simulate complete conversations:
# tests/scenarios.py (happy-path for job-post-builder)
from plugins.test_harness import SkillRunner, MockConnector
from pathlib import Path
def test_job_post_builder_happy_path(tmp_path: Path):
# Configure mock connectors
mock_google = MockConnector(service="google_drive")
mock_google.add_file("JD_template.docx", b"Template content …")
mock_docx = MockConnector(service="docx")
mock_docx.set_output_dir(tmp_path)
# Initialize runner
runner = SkillRunner(
skill_name="job-post-builder",
connectors={"google_drive": mock_google, "docx": mock_docx},
)
# Simulate conversation through phases 1-5
runner.send_user_message(
"We're hiring a senior product manager. Please write the job post and interview guide."
)
# Provide clarifications
runner.provide_answer("Role title", "Senior Product Manager")
runner.provide_answer("Key responsibilities", "Define product roadmap")
runner.provide_answer("Must-have qualifications", "5+ years PM experience")
runner.provide_answer("Offer letter delivery", "DocuSign")
runner.run_until_phase(6)
# Assertions
generated = list(tmp_path.glob("*.docx"))
assert len(generated) == 3
assert any("Job-Post.docx" in p.name for p in generated)
assert any("Interview-Guide.docx" in p.name for p in generated)
assert any("Offer-Letter.docx" in p.name for p in generated)
assert runner.last_output().contains("draft link")
This exercises the workflow-execution layer while respecting approval gates at phase 6.
Running the Test Suite
Executing the Test Runner
Run the complete test suite for a specific skill using the command-line interface:
python -m plugins.test_runner --skill job-post-builder
The runner performs the following actions:
- Loads the skill's YAML front-matter from
SKILL.md - Mocks required connectors (Google Drive, DocuSign, etc.)
- Executes the workflow phases
- Verifies generated files match reference structures in
references/job-post-structure.md
Validating Output Artefacts
Inspect generated outputs in the temporary test directory (/tmp/skill_test_<timestamp>/) to debug mismatches. The harness checks that:
- Files created via the
docxskill follow naming conventions ([Role]-Job-Post.docx) - Documents contain required sections defined in
references/job-post-structure.md - No external actions (DocuSign envelope send, Gmail draft send) occur without explicit user confirmation
What to Verify for Each Skill
When you test the functionality of a defined skill, validate these specific categories:
- Trigger matching: Confirm the skill fires on all phrases listed in
SKILL.mdand ignores unrelated utterances. - Phase execution: Verify each workflow step runs in order, asks expected clarification questions (Phase 1), loads appropriate reference files (Phases 3-5), and respects approval gates (Phase 6).
- Connector interactions: Ensure MCP connector calls use correct parameters and handle errors gracefully (e.g., fallback to CSV upload).
- Output artefacts: Check that generated files exist and match reference templates.
- Approval gates: Validate that the skill stops at "draft" stages for external actions without explicit confirmation.
- Edge-case handling: Cover scenarios from
reference/gotchas.mdwhere required connectors are missing or user context is incomplete.
Summary
- Skills are defined in
SKILL.mdfiles with YAML front-matter and requiretests/triggers.mdandtests/scenarios.mdfor comprehensive validation. - The test harness in
plugins/test_runner.pyuses a three-layer architecture: trigger-matching, workflow-execution with mocked connectors, and result-validation. - Mocked MCP connectors in
plugins/mocks/eliminate the need for live API credentials while providing deterministic test data. - Use
SkillRouterfor unit-testing trigger phrases andSkillRunnerfor end-to-end scenario validation. - Generated artefacts must conform to reference structures and respect approval gates before external actions execute.
Frequently Asked Questions
Where are test files located for a specific skill?
Test files reside in the tests/ subdirectory within each skill folder. For example, the job-post-builder skill stores its trigger tests in small-business/skills/job-post-builder/tests/triggers.md and scenario tests in small-business/skills/job-post-builder/tests/scenarios.md. The SKILL.md file references these locations so the test harness can discover them automatically.
How do I mock external services like DocuSign or Gmail?
Import MockConnector from plugins/test_harness.py and instantiate it with the service name (e.g., MockConnector(service="docusign")). The mock implementations in plugins/mocks/<service>.py emulate the connector contracts, allowing you to set deterministic return values for API calls. The test runner injects these mocks in place of real clients when you pass them to SkillRunner.
Can I run tests without real API credentials?
Yes. The repository's testing framework is designed to run entirely offline using the plugins/mocks/ package. Mocked connectors replace all external MCP service calls (QuickBooks, HubSpot, Google Drive, etc.) with in-memory stubs, enabling CI/CD pipelines to execute safely without credentials or network access to third-party APIs.
What happens if a skill fails the approval gate test?
If a skill attempts to send a DocuSign envelope or Gmail draft without explicit user confirmation, the test harness will fail the assertion checking runner.last_output() for draft status indicators. The skill must stop at the draft stage and provide a preview URL or file, waiting for user approval before executing destructive or external-facing actions.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →