# Coordination Mechanism for Batch Processing Workers Using TSV Tracker-Additions

> Discover the file-based coordination mechanism for batch processing workers using TSV tracker-additions. Learn how headless workers and merge scripts manage duplicates efficiently.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: internals
- Published: 2026-06-07

---

**The coordination relies on a file-based hand-off protocol where headless workers write single-line TSV files to `batch/tracker-additions/`, and the `merge-tracker.mjs` script consolidates these into the canonical tracker while resolving duplicates through three distinct deduplication strategies.**

In the `santifer/career-ops` repository, parallel evaluation of job offers is orchestrated without shared memory or network services. Instead, the system employs a simple file-based coordination mechanism where independent CLI agents record their progress via TSV tracker-additions that are later merged into the master application tracker.

## The File-Based Hand-Off Protocol

The architecture decouples worker processes from the canonical state store using an intermediate filesystem directory. Each headless worker (a CLI AI agent defined in [`AGENTS.md`](https://github.com/santifer/career-ops/blob/main/AGENTS.md)) independently processes a single job offer and generates both a Markdown report and a PDF CV. Upon completion, the worker writes its tracking information as a **single-line TSV file** to:

```

batch/tracker-additions/

```

This directory acts as a transient staging area. According to the repository's architecture documentation in [`docs/ARCHITECTURE.md`](https://github.com/santifer/career-ops/blob/main/docs/ARCHITECTURE.md), the "Tracker TSV" block sits between the workers and the canonical tracker, enabling safe parallel writes without locking mechanisms.

## TSV Format and Worker Contract

### File Naming Convention

Workers must name their output files following the pattern `{num}-{company-slug}.tsv`, such as `001-acme-corp.tsv`. This naming scheme ensures traceability while preventing filename collisions during batch processing.

### Column Structure

Each TSV line contains the same columns as a row in the canonical tracker ([`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md)): number, date, company, role, status, score, PDF-flag, report link, and optional notes. The `merge-tracker.mjs` script supports both 8-column and 9-column tab-separated formats, as well as pipe-delimited markdown rows.

A typical 9-column TSV entry written by a worker looks like this:

```tsv
001	2024-06-07	Acme Corp	Senior Backend Engineer	Evaluated	4.2/5	✅	[001](reports/001-acme-corp-2024-06-07.md)	First evaluation – strong match

```

*Columns*: `num`, `date`, `company`, `role`, `status`, `score`, `pdf`, `report`, `notes`.

## The Merge Pipeline

The `merge-tracker.mjs` script serves as the central coordinator. It scans `batch/tracker-additions/`, parses each TSV, normalizes the report link relative to the tracker file, and merges entries into [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md).

### Parsing Flexibility

The script handles column order variations automatically. As implemented in `merge-tracker.mjs`, it detects whether the status or score appears first by testing column patterns:

```javascript
// Detect column order: status-score or score-status
const col4 = parts[4].trim();
const col5 = parts[5].trim();
const col4LooksLikeScore = /^\d+\.?\d*\/5$/.test(col4) || col4 === 'N/A' || col4 === 'DUP';
const col5LooksLikeScore = /^\d+\.?\d*\/5$/.test(col5) || col5 === 'N/A' || col5 === 'DUP';
...
if (col4LooksLikeStatus && !col4LooksLikeScore) {
  statusCol = col4; scoreCol = col5;           // standard order
} else if (col4LooksLikeScore && col5LooksLikeStatus) {
  statusCol = col5; scoreCol = col4;           // swapped order
}

```

### Three-Layer Deduplication Strategy

The merge routine implements sophisticated duplicate detection to maintain data integrity:

1. **Exact report number match** – If the report link already exists in the tracker, the addition is ignored entirely.
2. **Exact entry number match** – Allows re-evaluation of existing entries; the newer entry replaces the old one only when its score is higher.
3. **Fuzzy company + role match** – Uses normalized company names and role-token similarity tests (ignoring common stop-words) to detect duplicates. When found, the higher-scoring offer updates the existing row.

This logic appears in `merge-tracker.mjs` around lines 50-58 and 70-77, ensuring that multiple evaluations of the same job offer converge to the highest-quality entry.

### Insertion Logic

For new entries, the script assigns a fresh sequential number (or uses the number supplied in the TSV) and inserts the row into the tracker table just below the header separator:

```javascript
if (!duplicate) {
  const entryNum = addition.num > maxNum ? addition.num : ++maxNum;
  const newLine = `| ${entryNum} | ${addition.date} | ${addition.company} | ${addition.role} | ${addition.score} | ${addition.status} | ${addition.pdf} | ${addition.report} | ${addition.notes} |`;
  newLines.push(newLine);
  added++;
}

```

## Post-Merge Cleanup and Coordination Flow

After processing all pending TSVs, the script moves the processed files into `batch/tracker-additions/merged/` to prevent re-processing in subsequent runs. The cleanup logic in `merge-tracker.mjs` handles directory creation and atomic file moves:

```javascript
if (!DRY_RUN) {
  writeFileSync(APPS_FILE, appLines.join('\n'));
  if (!existsSync(MERGED_DIR)) mkdirSync(MERGED_DIR, { recursive: true });
  for (const file of tsvFiles) {
    renameSync(join(ADDITIONS_DIR, file), join(MERGED_DIR, file));
  }
}

```

This file-based hand-off eliminates the need for database locks, message queues, or shared memory. Workers operate independently, writing TSV files to a known location, while the merge script provides eventual consistency by consolidating additions and resolving conflicts through deterministic rules.

## Summary

- **File-based coordination**: Workers write to `batch/tracker-additions/` using the `{num}-{company-slug}.tsv` naming pattern.
- **Flexible parsing**: `merge-tracker.mjs` handles 8- or 9-column TSV formats and detects column order automatically.
- **Deduplication**: Three strategies (exact report, exact entry with score comparison, fuzzy company+role) prevent duplicates while allowing quality updates.
- **Atomic cleanup**: Processed files move to `batch/tracker-additions/merged/` after successful consolidation into [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md).
- **Stateless workers**: The architecture requires no shared memory or network services, relying entirely on filesystem operations defined in [`AGENTS.md`](https://github.com/santifer/career-ops/blob/main/AGENTS.md) and [`docs/ARCHITECTURE.md`](https://github.com/santifer/career-ops/blob/main/docs/ARCHITECTURE.md).

## Frequently Asked Questions

### What is the exact file naming pattern for TSV tracker-additions?

Workers must use the format `{num}-{company-slug}.tsv`, such as `001-acme-corp.tsv`, where `num` is the entry number and `company-slug` is a normalized version of the company name. This pattern is defined in the batch workflow documentation and ensures unique, traceable filenames during parallel processing.

### How does `merge-tracker.mjs` handle duplicate entries?

The script implements three deduplication layers: it ignores additions with existing report links, replaces existing entries only when the new score is higher (matching by entry number), and detects fuzzy duplicates by comparing normalized company names and role tokens. Higher-scoring entries always take precedence when duplicates are detected.

### Where do processed TSV files move after merging?

After successful consolidation into the canonical tracker, files are moved from `batch/tracker-additions/` to `batch/tracker-additions/merged/`. This prevents re-processing in subsequent batch runs and provides an audit trail of integrated submissions.

### What columns are required in the TSV format?

The TSV must contain the same columns as the canonical tracker: number, date, company, role, status, score, PDF-flag, report link, and optional notes. The merge script supports both standard and swapped column orders for status and score, and accepts both tab-separated values and pipe-delimited markdown rows.