# How the merge-tracker Pipeline Handles TSV Entries in career-ops

> Discover how the merge-tracker pipeline processes TSV entries in career-ops. Learn about validation, column swapping, link normalization, and deduplication for efficient data management.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: internals
- Published: 2026-06-09

---

**The merge-tracker pipeline ingests one-line TSV files from `batch/tracker-additions/`, validates canonical statuses against [`templates/states.yml`](https://github.com/santifer/career-ops/blob/main/templates/states.yml), swaps score and status columns, normalizes report links to relative paths, and deduplicates by company-role pairs before writing to [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md).**

The `santifer/career-ops` repository automates job application tracking through a structured data pipeline. At the center of this workflow sits the **merge-tracker** utility, which bridges the gap between raw evaluation outputs and the permanent application tracker. This article explains exactly how the pipeline processes **TSV entries** to maintain data integrity in the career tracking system.

## TSV Input Format and Source Location

The pipeline expects tab-separated value files in the `batch/tracker-additions/` directory. Each file must follow the naming convention `{num}-{company-slug}.tsv`, where `num` represents the application number and `company-slug` is a normalized company identifier.

### Required Column Structure

According to the repository's [`AGENTS.md`](https://github.com/santifer/career-ops/blob/main/AGENTS.md) documentation, TSV entries must contain nine tab-separated fields in this exact order:

- `num`: Application identifier
- `date`: Evaluation date (YYYY-MM-DD)
- `company`: Company name
- `role`: Job title
- `status`: Current application state
- `score`: Evaluation rating
- `pdf`: PDF availability indicator
- `report`: Markdown link to the evaluation report
- `note`: Free-text annotation

## Data Transformation Logic

When `merge-tracker.mjs` executes, it performs several non-destructive transformations to ensure consistency with the tracker schema.

### Column Reordering for Schema Alignment

The raw TSV lists **status** before **score**, but [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) expects **score** to appear first. The pipeline automatically swaps these two columns during ingestion to match the target layout.

### Report Link Normalization

TSV files store report links as root-relative Markdown (e.g., `[123](reports/123-acme-2024-08-01.md)`). The script rewrites these to relative paths (`../reports/...`) based on the tracker file's location within `data/`. This normalization is idempotent, ensuring existing links remain valid without duplication.

### Canonical Status Enforcement

Before insertion, each status value is validated against the canonical definitions in [`templates/states.yml`](https://github.com/santifer/career-ops/blob/main/templates/states.yml). Only approved states (e.g., `Evaluated`, `Applied`, `Rejected`) are accepted; non-canonical values are rejected or coerced to valid equivalents.

### Deduplication by Composite Key

The pipeline checks for existing records matching the incoming **company** + **role** combination. When duplicates are detected, the script updates the existing row rather than appending a new entry, preventing tracker bloat.

## Running the merge-tracker Pipeline

Execute the transformation using Node.js from the repository root:

```bash
node merge-tracker.mjs

```

For bulk link repairs across the entire tracker:

```bash
node merge-tracker.mjs --migrate

```

### Example TSV to Tracker Flow

Consider a file at `batch/tracker-additions/001-acme.tsv` containing:

```text
001	2024-08-01	Acme	Senior Data Engineer	Evaluated	4.2/5	✅	[001](reports/001-acme-2024-08-01.md)	Great fit with ML pipeline

```

After processing, [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) receives:

```markdown
| 001 | 2024-08-01 | Acme | Senior Data Engineer | 4.2/5 | Evaluated | ✅ | [001](../reports/001-acme-2024-08-01.md) | Great fit with ML pipeline |

```

Notice the swapped score/status positions and the adjusted report link path.

## Summary

- The **merge-tracker** pipeline processes TSV files from `batch/tracker-additions/` using `merge-tracker.mjs`
- Input files must follow the `{num}-{company-slug}.tsv` naming convention with nine specific columns
- The script automatically **swaps score and status columns** to match the tracker schema
- **Report links** are normalized from root-relative to relative paths (`../reports/`)
- Only **canonical statuses** defined in [`templates/states.yml`](https://github.com/santifer/career-ops/blob/main/templates/states.yml) are accepted
- **Deduplication** logic updates existing entries when company-role pairs match, preventing duplicates

## Frequently Asked Questions

### What is the expected TSV file naming convention?

Files must reside in `batch/tracker-additions/` and use the format `{num}-{company-slug}.tsv`, where `num` is the application number and `company-slug` is a lowercase, hyphenated company identifier.

### Why does the pipeline swap the score and status columns?

The source TSV lists status in the fifth position and score in the sixth, but [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) expects score to precede status. The `merge-tracker.mjs` script performs this swap automatically to maintain schema consistency without manual intervention.

### How does merge-tracker prevent duplicate entries?

Before inserting new data, the pipeline queries existing records in [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) for matching **company** and **role** combinations. If a match exists, the script updates that row rather than creating a duplicate entry.

### What happens if a TSV contains an invalid status?

The pipeline validates all status values against the canonical list defined in [`templates/states.yml`](https://github.com/santifer/career-ops/blob/main/templates/states.yml). Invalid or non-standard statuses are rejected during processing, ensuring the tracker maintains consistent state terminology across all entries.