How the Provider Resolution Algorithm Works in Career-Ops: Understanding the Local-Parser Fallback
The provider resolution algorithm in Career-Ops iterates through built-in provider modules to find the first match via their detect() function, falling back to the generic local-parser module when no built-in provider recognizes a job board, allowing custom scripts to handle arbitrary career site formats.
Career-Ops, an open-source job board aggregator maintained by santifer, uses a modular provider system to handle diverse recruiting platforms like Greenhouse, Lever, and Ashby. When scanning company entries from portals.yml, the system must determine which code module understands each career site's structure. This article explains the provider resolution algorithm implemented in scan.mjs and how the local-parser fallback enables integration with unsupported or custom job boards.
How Provider Resolution Works
The resolution process begins in scan.mjs by dynamically importing every module from the providers/ directory, excluding files prefixed with underscores (such as providers/_http.mjs). For each company entry in portals.yml, the system queries these modules sequentially until one claims the entry.
The Detection Sequence
The algorithm follows a strict priority order:
- Module Discovery –
loadProviders()gathers all.mjsfiles fromproviders/via dynamic import. - Sequential Probing – For each entry, it calls
prov.detect(entry)in filesystem discovery order. - First-Match Wins – The first provider returning a non-null detection object wins the assignment.
- Explicit Fallback – If the loop completes without a match,
scan.mjsexplicitly importsproviders/local-parser.mjsas the final attempt.
The detection object must contain at least a url field, which may be the actual careers URL or a placeholder like "local-parser".
The Local-Parser Fallback Explained
Located at providers/local-parser.mjs, this module acts as a universal adapter for external parsing scripts. It implements the standard detect/fetch contract while delegating actual HTML processing or API calls to user-supplied executables written in Python, Node.js, shell scripts, or any other language.
Detection Logic
The detect(entry) function (lines 10-16) strictly validates two conditions before claiming an entry:
- The entry must define
parser.commandin its YAML configuration. - If the command references a script file,
existsSyncmust confirm the file exists on disk.
Only when both conditions pass does it return { url: entry.careers_url || 'local-parser' }. If the script file is missing, detection fails silently, allowing other providers to attempt matching.
Script Execution and Normalization
When selected, the fetch(entry) method invokes runLocalParser(entry) (lines 76-100), which:
- Builds command-line arguments via
buildParserArgs, substituting placeholders like{careers_url}. - Spawns the process using
execFileAsyncwith configurable timeouts. - Parses stdout as JSON and normalizes results through
normalizeParserJob(lines 58-74).
The provider accepts various output schemas, coercing alternative field names such as jobUrl, apply_url, or link into the canonical shape requiring title, url, company, and location.
Resource Limits and Safety
The fallback respects safety boundaries defined by constants LOCAL_PARSER_TIMEOUT_MS and LOCAL_PARSER_MAX_BUFFER_BYTES (lines 10-12), defaulting to 20,000ms execution time and 2,000,000 bytes output buffer. Users can override these per-entry via parser.timeout_ms and parser.max_buffer_bytes in portals.yml. This isolation prevents runaway scripts from crashing the main Node.js process.
Implementation in scan.mjs
The core resolution logic lives in scan.mjs. The simplified flow demonstrates how providers are loaded and selected:
// Load every provider (skip files that start with "_")
const providers = await loadProviders();
// Iterate until first match
let chosen = null;
for (const prov of providers) {
const detection = prov.detect(entry);
if (detection) {
chosen = { provider: prov, detection };
break;
}
}
// Fallback to local-parser if needed
if (!chosen) {
const local = await import('./providers/local-parser.mjs');
const detection = local.default.detect(entry);
if (detection) chosen = { provider: local.default, detection };
}
// Fetch jobs using the selected provider
const jobs = await chosen.provider.fetch(entry);
This architecture isolates external scripts in child processes, ensuring that parsing errors or infinite loops in custom code cannot destabilize the scanning pipeline.
Practical Configuration Examples
Parsing a Custom CSV Endpoint
For a company exposing jobs via CSV rather than a standard API:
- name: AcmeCo
careers_url: https://acme.example.com/jobs.csv
parser:
command: python3
args:
- scripts/acme_csv_parser.py
- '{careers_url}'
timeout_ms: 15000
The Python script must output a JSON array to stdout:
[
{"title": "Backend Engineer", "url": "https://acme.example.com/jobs/123", "location": "Remote"},
{"title": "Data Scientist", "url": "https://acme.example.com/jobs/456", "location": "Berlin"}
]
Overriding Built-in Providers
Even if a company's hostname matches a built-in pattern (like Greenhouse), you can force the generic fallback:
- name: WeirdCo
careers_url: https://weirdco.com/jobs
parser:
command: node
args:
- ./parsers/weirdco.js
The presence of parser.command makes the local-parser provider win the detection step, ensuring your custom logic runs regardless of standard heuristics.
Summary
- Provider resolution operates via sequential detection:
scan.mjsloads all modules fromproviders/, callsdetect(entry)on each, and stops at the first non-null response. - Underscore-prefixed files (like
providers/_http.mjs) are excluded from automatic loading, serving as shared utilities rather than standalone providers. - Local-parser fallback activates only when built-in providers fail to detect an entry and the YAML configuration contains a valid
parser.commandpointing to an existing file. - Execution safety is enforced through timeouts (default 20s), buffer limits (default 2MB), and child process isolation via
execFileAsync. - Output flexibility allows custom scripts to use various JSON schemas, which
normalizeParserJobcoerces into the standard job shape required by Career-Ops.
Frequently Asked Questions
What happens if two providers both detect the same entry?
The first provider in filesystem order wins. Since scan.mjs breaks the detection loop immediately upon receiving a non-null result, subsequent providers are never consulted. To prioritize a specific provider, ensure its filename sorts earlier alphabetically, or use the local-parser fallback with an explicit parser.command to bypass built-in detectors entirely.
Can I use the local-parser for sites that match built-in providers?
Yes. By defining a parser block in portals.yml, you override the standard resolution algorithm. The local-parser.mjs detection logic checks for parser.command before returning a match, effectively taking precedence over Greenhouse, Lever, or Ashby detectors even when the URL patterns would normally trigger those providers.
What JSON format must my custom parser output?
Your script must print a JSON array (or an object with jobs/results fields) to stdout. Each job object should contain title and url at minimum, though location and company are recommended. The normalizeParserJob function in local-parser.mjs automatically maps common alternatives like jobUrl, apply_url, link, or job_title to the canonical field names.
How do I debug a failing local-parser script?
First, verify that the script path in args is relative to the project root and that the file exists (the detect() function specifically checks existsSync and will skip to the next provider if missing). Then run your command manually with the same arguments Career-Ops uses, ensuring it exits with code 0 and outputs valid JSON within the configured timeout_ms and max_buffer_bytes limits.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →