How the merge-tracker Pipeline Handles TSV Entries in career-ops

The merge-tracker pipeline ingests one-line TSV files from batch/tracker-additions/, validates canonical statuses against templates/states.yml, swaps score and status columns, normalizes report links to relative paths, and deduplicates by company-role pairs before writing to data/applications.md.

The santifer/career-ops repository automates job application tracking through a structured data pipeline. At the center of this workflow sits the merge-tracker utility, which bridges the gap between raw evaluation outputs and the permanent application tracker. This article explains exactly how the pipeline processes TSV entries to maintain data integrity in the career tracking system.

TSV Input Format and Source Location

The pipeline expects tab-separated value files in the batch/tracker-additions/ directory. Each file must follow the naming convention {num}-{company-slug}.tsv, where num represents the application number and company-slug is a normalized company identifier.

Required Column Structure

According to the repository's AGENTS.md documentation, TSV entries must contain nine tab-separated fields in this exact order:

  • num: Application identifier
  • date: Evaluation date (YYYY-MM-DD)
  • company: Company name
  • role: Job title
  • status: Current application state
  • score: Evaluation rating
  • pdf: PDF availability indicator
  • report: Markdown link to the evaluation report
  • note: Free-text annotation

Data Transformation Logic

When merge-tracker.mjs executes, it performs several non-destructive transformations to ensure consistency with the tracker schema.

Column Reordering for Schema Alignment

The raw TSV lists status before score, but data/applications.md expects score to appear first. The pipeline automatically swaps these two columns during ingestion to match the target layout.

TSV files store report links as root-relative Markdown (e.g., [123](reports/123-acme-2024-08-01.md)). The script rewrites these to relative paths (../reports/...) based on the tracker file's location within data/. This normalization is idempotent, ensuring existing links remain valid without duplication.

Canonical Status Enforcement

Before insertion, each status value is validated against the canonical definitions in templates/states.yml. Only approved states (e.g., Evaluated, Applied, Rejected) are accepted; non-canonical values are rejected or coerced to valid equivalents.

Deduplication by Composite Key

The pipeline checks for existing records matching the incoming company + role combination. When duplicates are detected, the script updates the existing row rather than appending a new entry, preventing tracker bloat.

Running the merge-tracker Pipeline

Execute the transformation using Node.js from the repository root:

node merge-tracker.mjs

For bulk link repairs across the entire tracker:

node merge-tracker.mjs --migrate

Example TSV to Tracker Flow

Consider a file at batch/tracker-additions/001-acme.tsv containing:

001	2024-08-01	Acme	Senior Data Engineer	Evaluated	4.2/5	✅	[001](reports/001-acme-2024-08-01.md)	Great fit with ML pipeline

After processing, data/applications.md receives:

| 001 | 2024-08-01 | Acme | Senior Data Engineer | 4.2/5 | Evaluated | ✅ | [001](../reports/001-acme-2024-08-01.md) | Great fit with ML pipeline |

Notice the swapped score/status positions and the adjusted report link path.

Summary

  • The merge-tracker pipeline processes TSV files from batch/tracker-additions/ using merge-tracker.mjs
  • Input files must follow the {num}-{company-slug}.tsv naming convention with nine specific columns
  • The script automatically swaps score and status columns to match the tracker schema
  • Report links are normalized from root-relative to relative paths (../reports/)
  • Only canonical statuses defined in templates/states.yml are accepted
  • Deduplication logic updates existing entries when company-role pairs match, preventing duplicates

Frequently Asked Questions

What is the expected TSV file naming convention?

Files must reside in batch/tracker-additions/ and use the format {num}-{company-slug}.tsv, where num is the application number and company-slug is a lowercase, hyphenated company identifier.

Why does the pipeline swap the score and status columns?

The source TSV lists status in the fifth position and score in the sixth, but data/applications.md expects score to precede status. The merge-tracker.mjs script performs this swap automatically to maintain schema consistency without manual intervention.

How does merge-tracker prevent duplicate entries?

Before inserting new data, the pipeline queries existing records in data/applications.md for matching company and role combinations. If a match exists, the script updates that row rather than creating a duplicate entry.

What happens if a TSV contains an invalid status?

The pipeline validates all status values against the canonical list defined in templates/states.yml. Invalid or non-standard statuses are rejected during processing, ensuring the tracker maintains consistent state terminology across all entries.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →