Coordination Mechanism for Batch Processing Workers Using TSV Tracker-Additions

The coordination relies on a file-based hand-off protocol where headless workers write single-line TSV files to batch/tracker-additions/, and the merge-tracker.mjs script consolidates these into the canonical tracker while resolving duplicates through three distinct deduplication strategies.

In the santifer/career-ops repository, parallel evaluation of job offers is orchestrated without shared memory or network services. Instead, the system employs a simple file-based coordination mechanism where independent CLI agents record their progress via TSV tracker-additions that are later merged into the master application tracker.

The File-Based Hand-Off Protocol

The architecture decouples worker processes from the canonical state store using an intermediate filesystem directory. Each headless worker (a CLI AI agent defined in AGENTS.md) independently processes a single job offer and generates both a Markdown report and a PDF CV. Upon completion, the worker writes its tracking information as a single-line TSV file to:


batch/tracker-additions/

This directory acts as a transient staging area. According to the repository's architecture documentation in docs/ARCHITECTURE.md, the "Tracker TSV" block sits between the workers and the canonical tracker, enabling safe parallel writes without locking mechanisms.

TSV Format and Worker Contract

File Naming Convention

Workers must name their output files following the pattern {num}-{company-slug}.tsv, such as 001-acme-corp.tsv. This naming scheme ensures traceability while preventing filename collisions during batch processing.

Column Structure

Each TSV line contains the same columns as a row in the canonical tracker (data/applications.md): number, date, company, role, status, score, PDF-flag, report link, and optional notes. The merge-tracker.mjs script supports both 8-column and 9-column tab-separated formats, as well as pipe-delimited markdown rows.

A typical 9-column TSV entry written by a worker looks like this:

001	2024-06-07	Acme Corp	Senior Backend Engineer	Evaluated	4.2/5	[001](reports/001-acme-corp-2024-06-07.md)	First evaluation – strong match

Columns: num, date, company, role, status, score, pdf, report, notes.

The Merge Pipeline

The merge-tracker.mjs script serves as the central coordinator. It scans batch/tracker-additions/, parses each TSV, normalizes the report link relative to the tracker file, and merges entries into data/applications.md.

Parsing Flexibility

The script handles column order variations automatically. As implemented in merge-tracker.mjs, it detects whether the status or score appears first by testing column patterns:

// Detect column order: status-score or score-status
const col4 = parts[4].trim();
const col5 = parts[5].trim();
const col4LooksLikeScore = /^\d+\.?\d*\/5$/.test(col4) || col4 === 'N/A' || col4 === 'DUP';
const col5LooksLikeScore = /^\d+\.?\d*\/5$/.test(col5) || col5 === 'N/A' || col5 === 'DUP';
...
if (col4LooksLikeStatus && !col4LooksLikeScore) {
  statusCol = col4; scoreCol = col5;           // standard order
} else if (col4LooksLikeScore && col5LooksLikeStatus) {
  statusCol = col5; scoreCol = col4;           // swapped order
}

Three-Layer Deduplication Strategy

The merge routine implements sophisticated duplicate detection to maintain data integrity:

  1. Exact report number match – If the report link already exists in the tracker, the addition is ignored entirely.
  2. Exact entry number match – Allows re-evaluation of existing entries; the newer entry replaces the old one only when its score is higher.
  3. Fuzzy company + role match – Uses normalized company names and role-token similarity tests (ignoring common stop-words) to detect duplicates. When found, the higher-scoring offer updates the existing row.

This logic appears in merge-tracker.mjs around lines 50-58 and 70-77, ensuring that multiple evaluations of the same job offer converge to the highest-quality entry.

Insertion Logic

For new entries, the script assigns a fresh sequential number (or uses the number supplied in the TSV) and inserts the row into the tracker table just below the header separator:

if (!duplicate) {
  const entryNum = addition.num > maxNum ? addition.num : ++maxNum;
  const newLine = `| ${entryNum} | ${addition.date} | ${addition.company} | ${addition.role} | ${addition.score} | ${addition.status} | ${addition.pdf} | ${addition.report} | ${addition.notes} |`;
  newLines.push(newLine);
  added++;
}

Post-Merge Cleanup and Coordination Flow

After processing all pending TSVs, the script moves the processed files into batch/tracker-additions/merged/ to prevent re-processing in subsequent runs. The cleanup logic in merge-tracker.mjs handles directory creation and atomic file moves:

if (!DRY_RUN) {
  writeFileSync(APPS_FILE, appLines.join('\n'));
  if (!existsSync(MERGED_DIR)) mkdirSync(MERGED_DIR, { recursive: true });
  for (const file of tsvFiles) {
    renameSync(join(ADDITIONS_DIR, file), join(MERGED_DIR, file));
  }
}

This file-based hand-off eliminates the need for database locks, message queues, or shared memory. Workers operate independently, writing TSV files to a known location, while the merge script provides eventual consistency by consolidating additions and resolving conflicts through deterministic rules.

Summary

  • File-based coordination: Workers write to batch/tracker-additions/ using the {num}-{company-slug}.tsv naming pattern.
  • Flexible parsing: merge-tracker.mjs handles 8- or 9-column TSV formats and detects column order automatically.
  • Deduplication: Three strategies (exact report, exact entry with score comparison, fuzzy company+role) prevent duplicates while allowing quality updates.
  • Atomic cleanup: Processed files move to batch/tracker-additions/merged/ after successful consolidation into data/applications.md.
  • Stateless workers: The architecture requires no shared memory or network services, relying entirely on filesystem operations defined in AGENTS.md and docs/ARCHITECTURE.md.

Frequently Asked Questions

What is the exact file naming pattern for TSV tracker-additions?

Workers must use the format {num}-{company-slug}.tsv, such as 001-acme-corp.tsv, where num is the entry number and company-slug is a normalized version of the company name. This pattern is defined in the batch workflow documentation and ensures unique, traceable filenames during parallel processing.

How does merge-tracker.mjs handle duplicate entries?

The script implements three deduplication layers: it ignores additions with existing report links, replaces existing entries only when the new score is higher (matching by entry number), and detects fuzzy duplicates by comparing normalized company names and role tokens. Higher-scoring entries always take precedence when duplicates are detected.

Where do processed TSV files move after merging?

After successful consolidation into the canonical tracker, files are moved from batch/tracker-additions/ to batch/tracker-additions/merged/. This prevents re-processing in subsequent batch runs and provides an audit trail of integrated submissions.

What columns are required in the TSV format?

The TSV must contain the same columns as the canonical tracker: number, date, company, role, status, score, PDF-flag, report link, and optional notes. The merge script supports both standard and swapped column orders for status and score, and accepts both tab-separated values and pipe-delimited markdown rows.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →