How the Provider Resolution Algorithm Works in Career-Ops: Understanding the Local-Parser Fallback

The provider resolution algorithm in Career-Ops iterates through built-in provider modules to find the first match via their detect() function, falling back to the generic local-parser module when no built-in provider recognizes a job board, allowing custom scripts to handle arbitrary career site formats.

Career-Ops, an open-source job board aggregator maintained by santifer, uses a modular provider system to handle diverse recruiting platforms like Greenhouse, Lever, and Ashby. When scanning company entries from portals.yml, the system must determine which code module understands each career site's structure. This article explains the provider resolution algorithm implemented in scan.mjs and how the local-parser fallback enables integration with unsupported or custom job boards.

How Provider Resolution Works

The resolution process begins in scan.mjs by dynamically importing every module from the providers/ directory, excluding files prefixed with underscores (such as providers/_http.mjs). For each company entry in portals.yml, the system queries these modules sequentially until one claims the entry.

The Detection Sequence

The algorithm follows a strict priority order:

  1. Module DiscoveryloadProviders() gathers all .mjs files from providers/ via dynamic import.
  2. Sequential Probing – For each entry, it calls prov.detect(entry) in filesystem discovery order.
  3. First-Match Wins – The first provider returning a non-null detection object wins the assignment.
  4. Explicit Fallback – If the loop completes without a match, scan.mjs explicitly imports providers/local-parser.mjs as the final attempt.

The detection object must contain at least a url field, which may be the actual careers URL or a placeholder like "local-parser".

The Local-Parser Fallback Explained

Located at providers/local-parser.mjs, this module acts as a universal adapter for external parsing scripts. It implements the standard detect/fetch contract while delegating actual HTML processing or API calls to user-supplied executables written in Python, Node.js, shell scripts, or any other language.

Detection Logic

The detect(entry) function (lines 10-16) strictly validates two conditions before claiming an entry:

  • The entry must define parser.command in its YAML configuration.
  • If the command references a script file, existsSync must confirm the file exists on disk.

Only when both conditions pass does it return { url: entry.careers_url || 'local-parser' }. If the script file is missing, detection fails silently, allowing other providers to attempt matching.

Script Execution and Normalization

When selected, the fetch(entry) method invokes runLocalParser(entry) (lines 76-100), which:

  1. Builds command-line arguments via buildParserArgs, substituting placeholders like {careers_url}.
  2. Spawns the process using execFileAsync with configurable timeouts.
  3. Parses stdout as JSON and normalizes results through normalizeParserJob (lines 58-74).

The provider accepts various output schemas, coercing alternative field names such as jobUrl, apply_url, or link into the canonical shape requiring title, url, company, and location.

Resource Limits and Safety

The fallback respects safety boundaries defined by constants LOCAL_PARSER_TIMEOUT_MS and LOCAL_PARSER_MAX_BUFFER_BYTES (lines 10-12), defaulting to 20,000ms execution time and 2,000,000 bytes output buffer. Users can override these per-entry via parser.timeout_ms and parser.max_buffer_bytes in portals.yml. This isolation prevents runaway scripts from crashing the main Node.js process.

Implementation in scan.mjs

The core resolution logic lives in scan.mjs. The simplified flow demonstrates how providers are loaded and selected:

// Load every provider (skip files that start with "_")
const providers = await loadProviders();

// Iterate until first match
let chosen = null;
for (const prov of providers) {
  const detection = prov.detect(entry);
  if (detection) {
    chosen = { provider: prov, detection };
    break;
  }
}

// Fallback to local-parser if needed
if (!chosen) {
  const local = await import('./providers/local-parser.mjs');
  const detection = local.default.detect(entry);
  if (detection) chosen = { provider: local.default, detection };
}

// Fetch jobs using the selected provider
const jobs = await chosen.provider.fetch(entry);

This architecture isolates external scripts in child processes, ensuring that parsing errors or infinite loops in custom code cannot destabilize the scanning pipeline.

Practical Configuration Examples

Parsing a Custom CSV Endpoint

For a company exposing jobs via CSV rather than a standard API:

- name: AcmeCo
  careers_url: https://acme.example.com/jobs.csv
  parser:
    command: python3
    args:
      - scripts/acme_csv_parser.py
      - '{careers_url}'
    timeout_ms: 15000

The Python script must output a JSON array to stdout:

[
  {"title": "Backend Engineer", "url": "https://acme.example.com/jobs/123", "location": "Remote"},
  {"title": "Data Scientist", "url": "https://acme.example.com/jobs/456", "location": "Berlin"}
]

Overriding Built-in Providers

Even if a company's hostname matches a built-in pattern (like Greenhouse), you can force the generic fallback:

- name: WeirdCo
  careers_url: https://weirdco.com/jobs
  parser:
    command: node
    args:
      - ./parsers/weirdco.js

The presence of parser.command makes the local-parser provider win the detection step, ensuring your custom logic runs regardless of standard heuristics.

Summary

  • Provider resolution operates via sequential detection: scan.mjs loads all modules from providers/, calls detect(entry) on each, and stops at the first non-null response.
  • Underscore-prefixed files (like providers/_http.mjs) are excluded from automatic loading, serving as shared utilities rather than standalone providers.
  • Local-parser fallback activates only when built-in providers fail to detect an entry and the YAML configuration contains a valid parser.command pointing to an existing file.
  • Execution safety is enforced through timeouts (default 20s), buffer limits (default 2MB), and child process isolation via execFileAsync.
  • Output flexibility allows custom scripts to use various JSON schemas, which normalizeParserJob coerces into the standard job shape required by Career-Ops.

Frequently Asked Questions

What happens if two providers both detect the same entry?

The first provider in filesystem order wins. Since scan.mjs breaks the detection loop immediately upon receiving a non-null result, subsequent providers are never consulted. To prioritize a specific provider, ensure its filename sorts earlier alphabetically, or use the local-parser fallback with an explicit parser.command to bypass built-in detectors entirely.

Can I use the local-parser for sites that match built-in providers?

Yes. By defining a parser block in portals.yml, you override the standard resolution algorithm. The local-parser.mjs detection logic checks for parser.command before returning a match, effectively taking precedence over Greenhouse, Lever, or Ashby detectors even when the URL patterns would normally trigger those providers.

What JSON format must my custom parser output?

Your script must print a JSON array (or an object with jobs/results fields) to stdout. Each job object should contain title and url at minimum, though location and company are recommended. The normalizeParserJob function in local-parser.mjs automatically maps common alternatives like jobUrl, apply_url, link, or job_title to the canonical field names.

How do I debug a failing local-parser script?

First, verify that the script path in args is relative to the project root and that the file exists (the detect() function specifically checks existsSync and will skip to the next provider if missing). Then run your command manually with the same arguments Career-Ops uses, ensuring it exits with code 0 and outputs valid JSON within the configured timeout_ms and max_buffer_bytes limits.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →