How to Test Codex Plugins Before Publishing: Complete Local Testing Guide

Run the plugin-eval CLI to lint manifests, simulate natural language requests, and benchmark live Codex executions locally before submitting your plugin to the marketplace.

Testing a Codex plugin locally guarantees that the manifest, skills, and runtime commands are functional before publication. The openai/plugins repository provides a built-in plugin-eval harness that mirrors how Codex discovers plugins in production. This guide covers the step-by-step validation workflow using the actual CLI implementation found in plugins/plugin-eval/scripts/plugin-eval.js.

Install the Plugin Locally for Testing

Codex discovers plugins through a local marketplace directory. Before running tests, you must symlink your plugin into the correct location so the runtime can load it.

Create the Marketplace Directory

Create the plugins directory in your home folder if it does not exist:

mkdir -p ~/.agents/plugins

Create a symbolic link from your development directory to the marketplace location:

ln -sfn "$(pwd)/plugins/<plugin-name>" "$HOME/.agents/plugins/<plugin-name>"

For example, to test the render plugin from the repository root:

ln -sfn "$(pwd)/plugins/render" "$HOME/.agents/plugins/render"

Register in marketplace.json

Add your plugin to the marketplace descriptor so Codex can discover it. Create or edit ~/.agents/plugins/marketplace.json:

{
  "name": "local",
  "interface": { "displayName": "Local Plugins" },
  "plugins": [
    {
      "name": "<plugin-name>",
      "source": { "source": "local", "path": "./plugins/<plugin-name>" },
      "policy": {
        "installation": "AVAILABLE",
        "authentication": "ON_INSTALL"
      },
      "category": "Developer Tools"
    }
  ]
}

Restart Codex or the Codex CLI to reload the marketplace configuration.

Validate the Plugin Manifest

Each plugin requires a valid manifest at .codex-plugin/plugin.json. Run the analysis command to verify schema compliance:

node ./plugins/plugin-eval/scripts/plugin-eval.js analyze ./plugins/<plugin-name> --format markdown

The analyze command validates required fields including name, description, version, and manifestVersion. If the JSON is malformed, the CLI reports the exact line and field requiring correction.

Simulate User Interactions with Chat-First Testing

Test how Codex routes natural language requests to your plugin skills using the start command:

node ./plugins/plugin-eval/scripts/plugin-eval.js start ./plugins/<plugin-name> \
  --request "Evaluate this plugin." \
  --format markdown

This outputs the exact command sequence Codex will execute, including the routed skill and any subsequent analyze or initialization steps. According to the plugin-eval README, this simulates the live user experience without requiring a full Codex UI session.

Run Static Analysis

Perform a comprehensive static analysis to check skill metadata, asset consistency, and safety constraints:

node ./plugins/plugin-eval/scripts/plugin-eval.js analyze ./plugins/<plugin-name> --format markdown

The analysis validates:

  • Schema compliance: Required fields in plugin.json
  • Skill front-matter: Presence of retrieval.aliases, intents, and pathPatterns in SKILL.md files
  • Asset integrity: All referenced files exist with no stray binaries
  • Safety constraints: No hard-coded secrets and proper permission declarations

Benchmark Live Executions (Optional)

For plugins that interact with external services, generate realistic usage profiles to test runtime behavior.

Initialize and Run Benchmarks

node ./plugins/plugin-eval/scripts/plugin-eval.js init-benchmark ./plugins/<plugin-name>
node ./plugins/plugin-eval/scripts/plugin-eval.js benchmark ./plugins/<plugin-name> --format markdown

Benchmarks write JSONL usage logs to .plugin-eval/runs/<timestamp>/usage.jsonl.

Generate Measurement Plans

Feed usage logs back into the evaluator to create measurement plans:

node ./plugins/plugin-eval/scripts/plugin-eval.js measurement-plan ./plugins/<plugin-name> \
  --observed-usage .plugin-eval/runs/<timestamp>/usage.jsonl \
  --format markdown

The benchmark harness implementation is documented in plugins/plugin-eval/references/benchmark-harness.md.

Common Failure Modes and Fixes

When plugin-eval reports errors, check these typical issues:

  • Missing front-matter: SKILL.md files require retrieval.aliases, intents, and other metadata fields
  • Invalid paths: Verify all paths in .codex-plugin/plugin.json resolve correctly from the plugin root
  • Environment variables: Ensure required variables like OPENAI_API_KEY are set; note that plugin-eval deliberately never prints secret values to prevent leakage

Fix reported issues iteratively, re-running the relevant plugin-eval commands until all diagnostics pass.

Summary

Testing Codex plugins locally prevents publishing broken manifests or non-functional skills. Key steps include:

  • Symlink your plugin to ~/.agents/plugins/ and register it in marketplace.json
  • Run plugin-eval analyze to validate JSON schema and skill metadata
  • Use plugin-eval start to simulate natural language routing
  • Execute plugin-eval benchmark for live execution testing of external service integrations
  • Iterate until static analysis reports zero errors and benchmarks complete successfully

Frequently Asked Questions

How do I test a Codex plugin without modifying my marketplace?

Use the CLI direct path invocation. Run node ./plugins/plugin-eval/scripts/plugin-eval.js analyze ./plugins/<plugin-name> without creating symlinks or editing marketplace.json. This approach works in CI pipelines where persistent marketplace modifications are undesirable.

What does the plugin-eval start command actually simulate?

The start command simulates Codex's natural language processing pipeline. It takes a user request, routes it to the appropriate skill based on front-matter matching, and displays the exact CLI command sequence that would execute. This verifies intent recognition without requiring a running Codex UI instance.

Where does plugin-eval store benchmark results?

Benchmark results are written to .plugin-eval/runs/<timestamp>/usage.jsonl in your plugin directory. You can reference this file with the --observed-usage flag when generating measurement plans to analyze performance characteristics and API call patterns.

Why does plugin-eval not show my API key in error messages?

The tool explicitly filters secret values from all output to prevent credential leakage in logs. As documented in the plugin-eval README, the CLI performs safety checks but never prints environment variable values that match secret patterns, ensuring secure CI/CD integration.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →