How to Test the Functionality of a Defined Skill in Knowledge-Work-Plugins

Testing a defined skill in the anthropics/knowledge-work-plugins repository involves validating trigger phrases against tests/triggers.md, executing end-to-end scenarios using mocked MCP connectors via the Python test harness in plugins/test_runner.py, and asserting that generated artefacts match reference structures.

The anthropics/knowledge-work-plugins repository provides a framework for building AI-assisted skills using YAML front-matter and markdown definitions. To ensure reliability, you must test the functionality of a defined skill before deployment, which involves verifying trigger routing, workflow execution, and external connector interactions without requiring live API credentials.

How Skills Are Structured in the Repository

Each skill resides under a domain folder (e.g., small-business/skills/job-post-builder) and is defined by a YAML front-matter + markdown file named SKILL.md. The front-matter declares the skill name and description, while the body details trigger phrases, the step-by-step workflow, approval gates, and reference files loaded on-demand.

For comprehensive testing, every skill requires two specific test artefacts located in the tests/ subdirectory:

  • tests/triggers.md: Lists must-trigger, must-NOT-trigger, and ambiguous user utterances to validate routing logic.
  • tests/scenarios.md: Provides end-to-end scenario descriptions covering happy paths, missing connectors, and approval-gate flows.

Both files are referenced from SKILL.md to ensure the test harness can locate them automatically.

Test Harness Architecture

The testing framework operates through three distinct layers implemented in the plugins/ package.

The Three-Layer Testing Model

  1. Trigger-matching layer: Parses user utterances, loads the skill's trigger regexes derived from SKILL.md front-matter, and asserts the correct skill is selected.
  2. Workflow-execution layer: Runs the skill's phases using mocked MCP connectors (QuickBooks, Gmail, DocuSign, etc.) from the plugins/mocks/ package, replacing real network calls with deterministic stubs.
  3. Result-validation layer: Inspects generated artefacts (files saved via the docx skill, URLs returned for DocuSign drafts) against expectations defined in tests/scenarios.md.

The generic test runner lives in plugins/test_runner.py, while plugins/test_harness.py provides the core SkillRunner and SkillRouter utilities.

Why Mocked MCP Connectors Are Essential

Skills depend on external MCP services like QuickBooks, HubSpot, and DocuSign. Real API calls would require credentials and create flaky CI pipelines. The plugins/mocks/ package supplies in-memory stand-ins that emulate connector contracts defined in reference schema files (e.g., reference/hubspot-fields.md), returning deterministic data for consistent testing.

Writing Tests for a Defined Skill

Testing Trigger Matching

Validate that your skill correctly identifies relevant user utterances by creating tests using the SkillRouter class from plugins/test_harness.py:


# tests/triggers.py (example for job-post-builder)

from plugins.test_harness import SkillRouter

def test_job_post_builder_triggers():
    router = SkillRouter()
    
    # Must-trigger utterances

    assert router.select_skill("We need to hire a senior product manager") == "job-post-builder"
    assert router.select_skill("Write a job post for a data scientist") == "job-post-builder"
    
    # Must-NOT-trigger utterances

    assert router.select_skill("What is my cash flow?") != "job-post-builder"
    assert router.select_skill("Summarize the last email") != "job-post-builder"

Run these with pytest tests/triggers.py to verify the trigger-matching layer routes requests correctly.

Testing End-to-End Scenarios

For workflow validation, use SkillRunner with MockConnector instances to simulate complete conversations:


# tests/scenarios.py (happy-path for job-post-builder)

from plugins.test_harness import SkillRunner, MockConnector
from pathlib import Path

def test_job_post_builder_happy_path(tmp_path: Path):
    # Configure mock connectors

    mock_google = MockConnector(service="google_drive")
    mock_google.add_file("JD_template.docx", b"Template content …")
    
    mock_docx = MockConnector(service="docx")
    mock_docx.set_output_dir(tmp_path)

    # Initialize runner

    runner = SkillRunner(
        skill_name="job-post-builder",
        connectors={"google_drive": mock_google, "docx": mock_docx},
    )

    # Simulate conversation through phases 1-5

    runner.send_user_message(
        "We're hiring a senior product manager. Please write the job post and interview guide."
    )
    
    # Provide clarifications

    runner.provide_answer("Role title", "Senior Product Manager")
    runner.provide_answer("Key responsibilities", "Define product roadmap")
    runner.provide_answer("Must-have qualifications", "5+ years PM experience")
    runner.provide_answer("Offer letter delivery", "DocuSign")

    runner.run_until_phase(6)

    # Assertions

    generated = list(tmp_path.glob("*.docx"))
    assert len(generated) == 3
    assert any("Job-Post.docx" in p.name for p in generated)
    assert any("Interview-Guide.docx" in p.name for p in generated)
    assert any("Offer-Letter.docx" in p.name for p in generated)
    assert runner.last_output().contains("draft link")

This exercises the workflow-execution layer while respecting approval gates at phase 6.

Running the Test Suite

Executing the Test Runner

Run the complete test suite for a specific skill using the command-line interface:

python -m plugins.test_runner --skill job-post-builder

The runner performs the following actions:

  • Loads the skill's YAML front-matter from SKILL.md
  • Mocks required connectors (Google Drive, DocuSign, etc.)
  • Executes the workflow phases
  • Verifies generated files match reference structures in references/job-post-structure.md

Validating Output Artefacts

Inspect generated outputs in the temporary test directory (/tmp/skill_test_<timestamp>/) to debug mismatches. The harness checks that:

  • Files created via the docx skill follow naming conventions ([Role]-Job-Post.docx)
  • Documents contain required sections defined in references/job-post-structure.md
  • No external actions (DocuSign envelope send, Gmail draft send) occur without explicit user confirmation

What to Verify for Each Skill

When you test the functionality of a defined skill, validate these specific categories:

  • Trigger matching: Confirm the skill fires on all phrases listed in SKILL.md and ignores unrelated utterances.
  • Phase execution: Verify each workflow step runs in order, asks expected clarification questions (Phase 1), loads appropriate reference files (Phases 3-5), and respects approval gates (Phase 6).
  • Connector interactions: Ensure MCP connector calls use correct parameters and handle errors gracefully (e.g., fallback to CSV upload).
  • Output artefacts: Check that generated files exist and match reference templates.
  • Approval gates: Validate that the skill stops at "draft" stages for external actions without explicit confirmation.
  • Edge-case handling: Cover scenarios from reference/gotchas.md where required connectors are missing or user context is incomplete.

Summary

  • Skills are defined in SKILL.md files with YAML front-matter and require tests/triggers.md and tests/scenarios.md for comprehensive validation.
  • The test harness in plugins/test_runner.py uses a three-layer architecture: trigger-matching, workflow-execution with mocked connectors, and result-validation.
  • Mocked MCP connectors in plugins/mocks/ eliminate the need for live API credentials while providing deterministic test data.
  • Use SkillRouter for unit-testing trigger phrases and SkillRunner for end-to-end scenario validation.
  • Generated artefacts must conform to reference structures and respect approval gates before external actions execute.

Frequently Asked Questions

Where are test files located for a specific skill?

Test files reside in the tests/ subdirectory within each skill folder. For example, the job-post-builder skill stores its trigger tests in small-business/skills/job-post-builder/tests/triggers.md and scenario tests in small-business/skills/job-post-builder/tests/scenarios.md. The SKILL.md file references these locations so the test harness can discover them automatically.

How do I mock external services like DocuSign or Gmail?

Import MockConnector from plugins/test_harness.py and instantiate it with the service name (e.g., MockConnector(service="docusign")). The mock implementations in plugins/mocks/<service>.py emulate the connector contracts, allowing you to set deterministic return values for API calls. The test runner injects these mocks in place of real clients when you pass them to SkillRunner.

Can I run tests without real API credentials?

Yes. The repository's testing framework is designed to run entirely offline using the plugins/mocks/ package. Mocked connectors replace all external MCP service calls (QuickBooks, HubSpot, Google Drive, etc.) with in-memory stubs, enabling CI/CD pipelines to execute safely without credentials or network access to third-party APIs.

What happens if a skill fails the approval gate test?

If a skill attempts to send a DocuSign envelope or Gmail draft without explicit user confirmation, the test harness will fail the assertion checking runner.last_output() for draft status indicators. The skill must stop at the draft stage and provide a preview URL or file, waiting for user approval before executing destructive or external-facing actions.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →