How verify-pipeline.mjs Performs Health Checks on the Career-Ops Job Tracker

verify-pipeline.mjs parses the applications.md job tracker and executes seven deterministic validations—from canonical status verification to report-link existence—to enforce data integrity and exit with code 1 when errors are detected.

The santifer/career-ops repository relies on verify-pipeline.mjs to safeguard the integrity of its job-tracking pipeline. This Node.js script reads the tracker file and related auxiliary resources, then applies a strict set of health checks to catch duplicates, malformed rows, and broken references before they propagate downstream.

File Discovery and Initial Parsing

Before any validation runs, verify-pipeline.mjs resolves the path to the tracker file (lines 22‑28). It first checks for data/applications.md, falls back to a root-level applications.md, and respects the CAREER_OPS_TRACKER environment variable if set.

Once located, the script reads the tracker as a UTF‑8 string, splits it into lines, and parses every line beginning with | into a structured record object (lines 71‑83). Each record exposes the following fields: num, date, company, role, score, status, pdf, report, and notes. Header and divider lines are automatically skipped during this phase.

The Seven Core Health Checks

After parsing, the script runs its checks in sequence. Errors are logged with a red ❌ prefix, warnings with ⚠️, and passing checks with ✅. The counters for errors and warnings are incremented accordingly.

Canonical Status Validation

The first check ensures every entry’s status column contains an allowed canonical value or a known alias defined in templates/states.yml (lines 87‑112). It also forbids markdown bold syntax and embedded dates inside the status field, keeping the column clean for downstream processing.

Duplicate Detection

To prevent redundant entries, the script builds a lookup map keyed by company plus role (lines 113‑129). Both values are normalized to lowercase and stripped to alphanumeric characters before hashing. Any key with more than one matching row is flagged as a possible duplicate.

For each row, the script extracts the markdown hyperlink from the report column and verifies that the referenced file exists (lines 130‑146). The lookup checks the path relative to the tracker directory and also from the repository root, accommodating legacy link structures.

Score Format Enforcement

The score column must match the pattern X/5 (for example, 4.2/5) or use the special tokens N/A and DUP (lines 148‑156). Any other format is rejected immediately to maintain consistent scoring semantics across the tracker.

Row-Structure Integrity

This structural guard ensures every data row begins with | and contains at least nine pipe-delimited columns (lines 158‑168). Rows that fail this test are reported as malformed, protecting the parser from shifted or truncated data.

Pending TSV Detection

The script inspects the batch/tracker-additions/ directory for any .tsv files that have not yet been merged into the main tracker (lines 172‑180). If unmerged files exist, a warning prompts the user to run the merge step so that staged data does not diverge from the canonical file.

Bold-in-Score Warnings

Finally, the validator scans the score column for stray markdown bold markers (**) (lines 183‑190). Because bold formatting violates the tracker’s data contract, any instance is surfaced as a warning to preserve plain-text consistency.

Summary Reporting and Exit Codes

At the end of the run, the script aggregates results and prints a concise summary such as 📊 Pipeline Health: 0 errors, 2 warnings (lines 194‑204). A colour-coded status line follows: green when clean, yellow for warnings only, and red when errors exist. The process exits with process.exit(errors > 0 ? 1 : 0), allowing CI pipelines to fail automatically when integrity is compromised.

Running the Script Locally

Execute the validator from the repository root with the following command:

node verify-pipeline.mjs

Typical output for a healthy tracker looks like this:

📊 Checking 23 entries in applications.md

✅ All statuses are canonical
✅ No exact duplicates found
✅ All report links valid
✅ All scores valid
✅ All rows properly formatted
✅ No pending TSVs
✅ No bold in scores

--------------------------------------------------
📊 Pipeline Health: 0 errors, 0 warnings
🟢 Pipeline is clean!

When issues exist, the script prints the offending line number and a short description:

❌ #5: Non-canonical status "**Applied**"
⚠️ Possible duplicates: #12, #18 (Acme Corp — Senior Engineer)
❌ #7: Report not found: reports/007-acme-2023-05-01.md

CI Integration for Career-Ops

You can integrate verify-pipeline.mjs into package scripts or GitHub Actions to block corrupted data before it reaches the main branch.

Add a convenience script to package.json:

{
  "scripts": {
    "verify": "node verify-pipeline.mjs"
  }
}

Then reference it in a GitHub Action workflow:

- name: Verify tracker health
  run: npm run verify

Because the script exits with a non-zero status on error, the workflow step fails automatically and prevents the merge of invalid tracker data.

Summary

  • verify-pipeline.mjs resolves the tracker file via data/applications.md, a root-level fallback, or the CAREER_OPS_TRACKER environment variable.
  • The script parses every row into a structured record with fields such as company, role, status, score, and report.
  • Seven core checks safeguard the pipeline: canonical status validation, duplicate detection, report-link existence, score format enforcement, row-structure integrity, pending TSV detection, and bold-in-score warnings.
  • Errors and warnings are tallied throughout execution and summarized at the end (lines 194‑204).
  • The process exits with code 1 if any errors are found, making it ideal for CI gates and pre-commit hooks.

Frequently Asked Questions

What file does verify-pipeline.mjs check?

The script targets the job tracker, which defaults to data/applications.md inside the santifer/career-ops repository. If that path is missing, it falls back to a root-level applications.md, or it respects an override via the CAREER_OPS_TRACKER environment variable.

How does the script detect duplicate job applications?

It concatenates the company and role values for each row, normalizes them to lowercase, and strips non-alphanumeric characters to create a lookup key (lines 113‑129). If two or more rows share the same key, the script flags them as possible duplicates.

During the report-link sanity check (lines 130‑146), the script extracts the markdown hyperlink from the report column and verifies the file exists relative to the tracker directory or the repository root. If the file is not found, the script logs an error and ultimately exits with code 1.

Can I use verify-pipeline.mjs inside a GitHub Action?

Yes. The script is designed for automation because it returns a non-zero exit code whenever errors are detected. You can invoke it with npm run verify or node verify-pipeline.mjs as a step in any CI workflow, and the job will fail immediately if the tracker violates any validation rule.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →