How to Test Codex Plugins Before Publishing: Complete Local Testing Guide
Run the plugin-eval CLI to lint manifests, simulate natural language requests, and benchmark live Codex executions locally before submitting your plugin to the marketplace.
Testing a Codex plugin locally guarantees that the manifest, skills, and runtime commands are functional before publication. The openai/plugins repository provides a built-in plugin-eval harness that mirrors how Codex discovers plugins in production. This guide covers the step-by-step validation workflow using the actual CLI implementation found in plugins/plugin-eval/scripts/plugin-eval.js.
Install the Plugin Locally for Testing
Codex discovers plugins through a local marketplace directory. Before running tests, you must symlink your plugin into the correct location so the runtime can load it.
Create the Marketplace Directory
Create the plugins directory in your home folder if it does not exist:
mkdir -p ~/.agents/plugins
Link Your Plugin Source
Create a symbolic link from your development directory to the marketplace location:
ln -sfn "$(pwd)/plugins/<plugin-name>" "$HOME/.agents/plugins/<plugin-name>"
For example, to test the render plugin from the repository root:
ln -sfn "$(pwd)/plugins/render" "$HOME/.agents/plugins/render"
Register in marketplace.json
Add your plugin to the marketplace descriptor so Codex can discover it. Create or edit ~/.agents/plugins/marketplace.json:
{
"name": "local",
"interface": { "displayName": "Local Plugins" },
"plugins": [
{
"name": "<plugin-name>",
"source": { "source": "local", "path": "./plugins/<plugin-name>" },
"policy": {
"installation": "AVAILABLE",
"authentication": "ON_INSTALL"
},
"category": "Developer Tools"
}
]
}
Restart Codex or the Codex CLI to reload the marketplace configuration.
Validate the Plugin Manifest
Each plugin requires a valid manifest at .codex-plugin/plugin.json. Run the analysis command to verify schema compliance:
node ./plugins/plugin-eval/scripts/plugin-eval.js analyze ./plugins/<plugin-name> --format markdown
The analyze command validates required fields including name, description, version, and manifestVersion. If the JSON is malformed, the CLI reports the exact line and field requiring correction.
Simulate User Interactions with Chat-First Testing
Test how Codex routes natural language requests to your plugin skills using the start command:
node ./plugins/plugin-eval/scripts/plugin-eval.js start ./plugins/<plugin-name> \
--request "Evaluate this plugin." \
--format markdown
This outputs the exact command sequence Codex will execute, including the routed skill and any subsequent analyze or initialization steps. According to the plugin-eval README, this simulates the live user experience without requiring a full Codex UI session.
Run Static Analysis
Perform a comprehensive static analysis to check skill metadata, asset consistency, and safety constraints:
node ./plugins/plugin-eval/scripts/plugin-eval.js analyze ./plugins/<plugin-name> --format markdown
The analysis validates:
- Schema compliance: Required fields in
plugin.json - Skill front-matter: Presence of
retrieval.aliases,intents, andpathPatternsin SKILL.md files - Asset integrity: All referenced files exist with no stray binaries
- Safety constraints: No hard-coded secrets and proper permission declarations
Benchmark Live Executions (Optional)
For plugins that interact with external services, generate realistic usage profiles to test runtime behavior.
Initialize and Run Benchmarks
node ./plugins/plugin-eval/scripts/plugin-eval.js init-benchmark ./plugins/<plugin-name>
node ./plugins/plugin-eval/scripts/plugin-eval.js benchmark ./plugins/<plugin-name> --format markdown
Benchmarks write JSONL usage logs to .plugin-eval/runs/<timestamp>/usage.jsonl.
Generate Measurement Plans
Feed usage logs back into the evaluator to create measurement plans:
node ./plugins/plugin-eval/scripts/plugin-eval.js measurement-plan ./plugins/<plugin-name> \
--observed-usage .plugin-eval/runs/<timestamp>/usage.jsonl \
--format markdown
The benchmark harness implementation is documented in plugins/plugin-eval/references/benchmark-harness.md.
Common Failure Modes and Fixes
When plugin-eval reports errors, check these typical issues:
- Missing front-matter: SKILL.md files require
retrieval.aliases,intents, and other metadata fields - Invalid paths: Verify all paths in
.codex-plugin/plugin.jsonresolve correctly from the plugin root - Environment variables: Ensure required variables like
OPENAI_API_KEYare set; note thatplugin-evaldeliberately never prints secret values to prevent leakage
Fix reported issues iteratively, re-running the relevant plugin-eval commands until all diagnostics pass.
Summary
Testing Codex plugins locally prevents publishing broken manifests or non-functional skills. Key steps include:
- Symlink your plugin to
~/.agents/plugins/and register it inmarketplace.json - Run
plugin-eval analyzeto validate JSON schema and skill metadata - Use
plugin-eval startto simulate natural language routing - Execute
plugin-eval benchmarkfor live execution testing of external service integrations - Iterate until static analysis reports zero errors and benchmarks complete successfully
Frequently Asked Questions
How do I test a Codex plugin without modifying my marketplace?
Use the CLI direct path invocation. Run node ./plugins/plugin-eval/scripts/plugin-eval.js analyze ./plugins/<plugin-name> without creating symlinks or editing marketplace.json. This approach works in CI pipelines where persistent marketplace modifications are undesirable.
What does the plugin-eval start command actually simulate?
The start command simulates Codex's natural language processing pipeline. It takes a user request, routes it to the appropriate skill based on front-matter matching, and displays the exact CLI command sequence that would execute. This verifies intent recognition without requiring a running Codex UI instance.
Where does plugin-eval store benchmark results?
Benchmark results are written to .plugin-eval/runs/<timestamp>/usage.jsonl in your plugin directory. You can reference this file with the --observed-usage flag when generating measurement plans to analyze performance characteristics and API call patterns.
Why does plugin-eval not show my API key in error messages?
The tool explicitly filters secret values from all output to prevent credential leakage in logs. As documented in the plugin-eval README, the CLI performs safety checks but never prints environment variable values that match secret patterns, ensuring secure CI/CD integration.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →