How to Normalize Canonical Statuses Across Tracker Entries in Career-Ops
The normalize-statuses.mjs script enforces data integrity by scanning applications.md and remapping any non-canonical status values to one of eight defined states listed in templates/states.yml.
In the santifer/career-ops repository, tracker entries for job applications rely on a strict taxonomy of canonical statuses to ensure consistent reporting. The system uses a Node.js normalization script to automatically detect and correct malformed or language-specific status values, converting them to a standardized English set defined in a YAML configuration file. This process prevents downstream analytics errors and maintains a single source of truth across localized entries.
The Canonical Status Schema
According to the santifer/career-ops source code, the authoritative list of valid states resides in templates/states.yml. This file defines eight canonical labels, each mapping to specific language aliases to support internationalization.
The canonical states are:
- Evaluated – Maps from
evaluada. - Applied – Maps from
aplicado,enviada,aplicada, orsent. - Responded – Maps from
respondido. - Interview – Maps from
entrevista. - Offer – Maps from
oferta. - Rejected – Maps from
rechazadoorrechazada. - Discarded – Maps from
descartado,descartada,cerrada, orcancelada. - SKIP – Maps from
no_aplicar,no aplicar,skip, ormonitor.
Each entry in the tracker file applications.md (or data/applications.md) must conform to one of these eight values to be considered valid by downstream scripts like merge-tracker.mjs.
How Statuses Are Normalized Across Tracker Entries
The normalize-statuses.mjs script performs an automated sweep of the tracker file to enforce the canonical schema. The process executes in eight distinct steps:
- Load the source file – The script selects
data/applications.mdif present, otherwise falling back to the legacyapplications.mdpath (lines 19‑22). - Parse the status column – For each markdown table row, the script isolates the status field (
parts[6]). - Sanitize formatting – The script removes markdown bold syntax (
**) usings.replace(/\*\*/g, '')to ensure clean string matching. - Map to canonical values – The
normalizeStatus(raw)function (lines 28‑86) handles the core logic:- Detects specific markers like DUPLICADO, CERRADA, or RECHAZADO and maps them to
DiscardedorRejected. - Strips date suffixes (e.g.,
aplicado 2023) to returnApplied. - Translates Spanish aliases (
evaluada,entrevista) to English equivalents (Evaluated,Interview). - Validates against the canonical list (
Evaluated,Applied,Responded,Interview,Offer,Rejected,Discarded,SKIP) using case‑insensitive matching.
- Detects specific markers like DUPLICADO, CERRADA, or RECHAZADO and maps them to
- Preserve auxiliary data – If a status like
DUPLICADOcontains extra text, that content is moved to the notes column (lines 26‑34). - Rewrite the row – The script reconstructs the table line with the new canonical status and cleans the score column of bold formatting (lines 36‑40).
- Log changes – Every substitution is reported to the console in the format
#${num}: "old" → "new". - Atomic write with backup – Unless the
--dry‑runflag is set, the original file is backed up asapplications.md.bakbefore the normalized content is written (lines 58‑62).
Practical Usage and Code Examples
You can execute the normalizer directly from the command line. Use the --dry‑run flag to preview changes without modifying data.
Preview changes safely:
node normalize-statuses.mjs --dry-run
Apply normalization and create a backup:
node normalize-statuses.mjs
Input Transformation Example
Consider a tracker row containing a non‑canonical Spanish status with formatting artifacts:
| 12 | 2024-04-01 | Acme Corp | Senior Engineer | 4.5/5 | **Aplicado 2024** | ✅ | [12](reports/012-acme-2024-04-01.md) | |
After processing, the script produces a clean, canonical entry:
| 12 | 2024-04-01 | Acme Corp | Senior Engineer | 4.5/5 | Applied | ✅ | [12](reports/012-acme-2024-04-01.md) | |
In this example, the date suffix is stripped, the Spanish alias Aplicado is normalized to Applied, and the markdown bold markers are removed from the status column.
Safety Features and Data Integrity
The normalization workflow includes safeguards to prevent data loss. Before writing changes, the script creates a timestamped backup file named applications.md.bak, allowing for manual rollback if necessary. The --dry‑run mode provides a complete diff of proposed changes without altering the source file, enabling validation before execution.
Maintaining canonical statuses is critical for downstream consumption. Scripts such as merge-tracker.mjs assume the status column conforms to the eight authorized values; mismatched strings would break grouping logic and corrupt analytics dashboards. By centralizing the canonical definition in templates/states.yml and enforcing it via normalize-statuses.mjs, the repository ensures that internationalized entries (Spanish variants) integrate seamlessly into an English‑standardized reporting pipeline.
Summary
- Canonical statuses are defined in
templates/states.ymland include eight states:Evaluated,Applied,Responded,Interview,Offer,Rejected,Discarded, andSKIP. - The
normalize-statuses.mjsscript scansapplications.md(ordata/applications.md) and remaps any alias or malformed status to its canonical English equivalent. - The
normalizeStatus()function handles Spanish translations, date suffixes, and specific markers likeDUPLICADOorCERRADA. - Safety mechanisms include a
--dry‑runpreview mode and an automatic.bakfile creation before overwriting. - Downstream tools like
merge-tracker.mjsrely on this normalization to ensure data integrity and accurate reporting.
Frequently Asked Questions
Where are the canonical status definitions stored?
The canonical status definitions are stored in templates/states.yml in the root of the santifer/career-ops repository. This YAML file lists the eight valid states—such as Applied, Interview, and Rejected—along with their accepted aliases in other languages.
How does the normalizer handle Spanish status values?
The normalizeStatus() function in normalize-statuses.mjs includes a mapping layer that converts Spanish terms like evaluada, aplicado, and rechazado to their English canonical equivalents (Evaluated, Applied, Rejected). It performs case‑insensitive matching and strips formatting characters to ensure robust detection.
Is it possible to run the normalization without changing my data?
Yes. You can run the script with the --dry‑run flag to preview all proposed changes in the console. This mode parses the file and logs every substitution (e.g., #12: "Aplicado" → "Applied") but does not write to applications.md or create a backup file.
What does the script do if a status includes a date or extra notes?
The parser automatically strips date suffixes (e.g., aplicado 2023 becomes Applied). If specific markers like DUPLICADO contain additional text, that content is moved to the notes column before the status is reset to its canonical value, ensuring no auxiliary information is lost during normalization.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →