How the merge-tracker Pipeline Handles TSV Entries in career-ops
The merge-tracker pipeline ingests one-line TSV files from batch/tracker-additions/, validates canonical statuses against templates/states.yml, swaps score and status columns, normalizes report links to relative paths, and deduplicates by company-role pairs before writing to data/applications.md.
The santifer/career-ops repository automates job application tracking through a structured data pipeline. At the center of this workflow sits the merge-tracker utility, which bridges the gap between raw evaluation outputs and the permanent application tracker. This article explains exactly how the pipeline processes TSV entries to maintain data integrity in the career tracking system.
TSV Input Format and Source Location
The pipeline expects tab-separated value files in the batch/tracker-additions/ directory. Each file must follow the naming convention {num}-{company-slug}.tsv, where num represents the application number and company-slug is a normalized company identifier.
Required Column Structure
According to the repository's AGENTS.md documentation, TSV entries must contain nine tab-separated fields in this exact order:
num: Application identifierdate: Evaluation date (YYYY-MM-DD)company: Company namerole: Job titlestatus: Current application statescore: Evaluation ratingpdf: PDF availability indicatorreport: Markdown link to the evaluation reportnote: Free-text annotation
Data Transformation Logic
When merge-tracker.mjs executes, it performs several non-destructive transformations to ensure consistency with the tracker schema.
Column Reordering for Schema Alignment
The raw TSV lists status before score, but data/applications.md expects score to appear first. The pipeline automatically swaps these two columns during ingestion to match the target layout.
Report Link Normalization
TSV files store report links as root-relative Markdown (e.g., [123](reports/123-acme-2024-08-01.md)). The script rewrites these to relative paths (../reports/...) based on the tracker file's location within data/. This normalization is idempotent, ensuring existing links remain valid without duplication.
Canonical Status Enforcement
Before insertion, each status value is validated against the canonical definitions in templates/states.yml. Only approved states (e.g., Evaluated, Applied, Rejected) are accepted; non-canonical values are rejected or coerced to valid equivalents.
Deduplication by Composite Key
The pipeline checks for existing records matching the incoming company + role combination. When duplicates are detected, the script updates the existing row rather than appending a new entry, preventing tracker bloat.
Running the merge-tracker Pipeline
Execute the transformation using Node.js from the repository root:
node merge-tracker.mjs
For bulk link repairs across the entire tracker:
node merge-tracker.mjs --migrate
Example TSV to Tracker Flow
Consider a file at batch/tracker-additions/001-acme.tsv containing:
001 2024-08-01 Acme Senior Data Engineer Evaluated 4.2/5 ✅ [001](reports/001-acme-2024-08-01.md) Great fit with ML pipeline
After processing, data/applications.md receives:
| 001 | 2024-08-01 | Acme | Senior Data Engineer | 4.2/5 | Evaluated | ✅ | [001](../reports/001-acme-2024-08-01.md) | Great fit with ML pipeline |
Notice the swapped score/status positions and the adjusted report link path.
Summary
- The merge-tracker pipeline processes TSV files from
batch/tracker-additions/usingmerge-tracker.mjs - Input files must follow the
{num}-{company-slug}.tsvnaming convention with nine specific columns - The script automatically swaps score and status columns to match the tracker schema
- Report links are normalized from root-relative to relative paths (
../reports/) - Only canonical statuses defined in
templates/states.ymlare accepted - Deduplication logic updates existing entries when company-role pairs match, preventing duplicates
Frequently Asked Questions
What is the expected TSV file naming convention?
Files must reside in batch/tracker-additions/ and use the format {num}-{company-slug}.tsv, where num is the application number and company-slug is a lowercase, hyphenated company identifier.
Why does the pipeline swap the score and status columns?
The source TSV lists status in the fifth position and score in the sixth, but data/applications.md expects score to precede status. The merge-tracker.mjs script performs this swap automatically to maintain schema consistency without manual intervention.
How does merge-tracker prevent duplicate entries?
Before inserting new data, the pipeline queries existing records in data/applications.md for matching company and role combinations. If a match exists, the script updates that row rather than creating a duplicate entry.
What happens if a TSV contains an invalid status?
The pipeline validates all status values against the canonical list defined in templates/states.yml. Invalid or non-standard statuses are rejected during processing, ensuring the tracker maintains consistent state terminology across all entries.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →