# How the Provider Resolution Algorithm Works in Career-Ops: Understanding the Local-Parser Fallback

> Understand the Career-Ops provider resolution algorithm. Learn how the local-parser fallback handles unrecognized job boards with custom scripts for flexible career site integration.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: internals
- Published: 2026-06-07

---

**The provider resolution algorithm in Career-Ops iterates through built-in provider modules to find the first match via their `detect()` function, falling back to the generic `local-parser` module when no built-in provider recognizes a job board, allowing custom scripts to handle arbitrary career site formats.**

Career-Ops, an open-source job board aggregator maintained by santifer, uses a modular provider system to handle diverse recruiting platforms like Greenhouse, Lever, and Ashby. When scanning company entries from [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml), the system must determine which code module understands each career site's structure. This article explains the **provider resolution algorithm** implemented in `scan.mjs` and how the **local-parser fallback** enables integration with unsupported or custom job boards.

## How Provider Resolution Works

The resolution process begins in `scan.mjs` by dynamically importing every module from the `providers/` directory, excluding files prefixed with underscores (such as `providers/_http.mjs`). For each company entry in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml), the system queries these modules sequentially until one claims the entry.

### The Detection Sequence

The algorithm follows a strict priority order:

1. **Module Discovery** – `loadProviders()` gathers all `.mjs` files from `providers/` via dynamic import.
2. **Sequential Probing** – For each entry, it calls `prov.detect(entry)` in filesystem discovery order.
3. **First-Match Wins** – The first provider returning a non-null detection object wins the assignment.
4. **Explicit Fallback** – If the loop completes without a match, `scan.mjs` explicitly imports `providers/local-parser.mjs` as the final attempt.

The detection object must contain at least a `url` field, which may be the actual careers URL or a placeholder like `"local-parser"`.

## The Local-Parser Fallback Explained

Located at `providers/local-parser.mjs`, this module acts as a universal adapter for external parsing scripts. It implements the standard `detect`/`fetch` contract while delegating actual HTML processing or API calls to user-supplied executables written in Python, Node.js, shell scripts, or any other language.

### Detection Logic

The `detect(entry)` function (lines 10-16) strictly validates two conditions before claiming an entry:

- The entry must define `parser.command` in its YAML configuration.
- If the command references a script file, `existsSync` must confirm the file exists on disk.

Only when both conditions pass does it return `{ url: entry.careers_url || 'local-parser' }`. If the script file is missing, detection fails silently, allowing other providers to attempt matching.

### Script Execution and Normalization

When selected, the `fetch(entry)` method invokes `runLocalParser(entry)` (lines 76-100), which:

1. Builds command-line arguments via `buildParserArgs`, substituting placeholders like `{careers_url}`.
2. Spawns the process using `execFileAsync` with configurable timeouts.
3. Parses stdout as JSON and normalizes results through `normalizeParserJob` (lines 58-74).

The provider accepts various output schemas, coercing alternative field names such as `jobUrl`, `apply_url`, or `link` into the canonical shape requiring `title`, `url`, `company`, and `location`.

### Resource Limits and Safety

The fallback respects safety boundaries defined by constants `LOCAL_PARSER_TIMEOUT_MS` and `LOCAL_PARSER_MAX_BUFFER_BYTES` (lines 10-12), defaulting to **20,000ms** execution time and **2,000,000 bytes** output buffer. Users can override these per-entry via `parser.timeout_ms` and `parser.max_buffer_bytes` in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml). This isolation prevents runaway scripts from crashing the main Node.js process.

## Implementation in scan.mjs

The core resolution logic lives in `scan.mjs`. The simplified flow demonstrates how providers are loaded and selected:

```javascript
// Load every provider (skip files that start with "_")
const providers = await loadProviders();

// Iterate until first match
let chosen = null;
for (const prov of providers) {
  const detection = prov.detect(entry);
  if (detection) {
    chosen = { provider: prov, detection };
    break;
  }
}

// Fallback to local-parser if needed
if (!chosen) {
  const local = await import('./providers/local-parser.mjs');
  const detection = local.default.detect(entry);
  if (detection) chosen = { provider: local.default, detection };
}

// Fetch jobs using the selected provider
const jobs = await chosen.provider.fetch(entry);

```

This architecture isolates external scripts in child processes, ensuring that parsing errors or infinite loops in custom code cannot destabilize the scanning pipeline.

## Practical Configuration Examples

### Parsing a Custom CSV Endpoint

For a company exposing jobs via CSV rather than a standard API:

```yaml
- name: AcmeCo
  careers_url: https://acme.example.com/jobs.csv
  parser:
    command: python3
    args:
      - scripts/acme_csv_parser.py
      - '{careers_url}'
    timeout_ms: 15000

```

The Python script must output a JSON array to stdout:

```json
[
  {"title": "Backend Engineer", "url": "https://acme.example.com/jobs/123", "location": "Remote"},
  {"title": "Data Scientist", "url": "https://acme.example.com/jobs/456", "location": "Berlin"}
]

```

### Overriding Built-in Providers

Even if a company's hostname matches a built-in pattern (like Greenhouse), you can force the generic fallback:

```yaml
- name: WeirdCo
  careers_url: https://weirdco.com/jobs
  parser:
    command: node
    args:
      - ./parsers/weirdco.js

```

The presence of `parser.command` makes the `local-parser` provider win the detection step, ensuring your custom logic runs regardless of standard heuristics.

## Summary

- **Provider resolution** operates via sequential detection: `scan.mjs` loads all modules from `providers/`, calls `detect(entry)` on each, and stops at the first non-null response.
- **Underscore-prefixed files** (like `providers/_http.mjs`) are excluded from automatic loading, serving as shared utilities rather than standalone providers.
- **Local-parser fallback** activates only when built-in providers fail to detect an entry and the YAML configuration contains a valid `parser.command` pointing to an existing file.
- **Execution safety** is enforced through timeouts (default 20s), buffer limits (default 2MB), and child process isolation via `execFileAsync`.
- **Output flexibility** allows custom scripts to use various JSON schemas, which `normalizeParserJob` coerces into the standard job shape required by Career-Ops.

## Frequently Asked Questions

### What happens if two providers both detect the same entry?

The **first provider in filesystem order** wins. Since `scan.mjs` breaks the detection loop immediately upon receiving a non-null result, subsequent providers are never consulted. To prioritize a specific provider, ensure its filename sorts earlier alphabetically, or use the `local-parser` fallback with an explicit `parser.command` to bypass built-in detectors entirely.

### Can I use the local-parser for sites that match built-in providers?

Yes. By defining a `parser` block in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml), you override the standard resolution algorithm. The `local-parser.mjs` detection logic checks for `parser.command` before returning a match, effectively taking precedence over Greenhouse, Lever, or Ashby detectors even when the URL patterns would normally trigger those providers.

### What JSON format must my custom parser output?

Your script must print a JSON array (or an object with `jobs`/`results` fields) to stdout. Each job object should contain `title` and `url` at minimum, though `location` and `company` are recommended. The `normalizeParserJob` function in `local-parser.mjs` automatically maps common alternatives like `jobUrl`, `apply_url`, `link`, or `job_title` to the canonical field names.

### How do I debug a failing local-parser script?

First, verify that the script path in `args` is relative to the project root and that the file exists (the `detect()` function specifically checks `existsSync` and will skip to the next provider if missing). Then run your command manually with the same arguments Career-Ops uses, ensuring it exits with code 0 and outputs valid JSON within the configured `timeout_ms` and `max_buffer_bytes` limits.