# How the Career-Ops Portal Scanner Detects and Loads ATS Providers (Greenhouse, Lever, Ashby)

> Learn how the career-ops scanner detects and loads ATS providers like Greenhouse, Lever, and Ashby. Discover dynamic module loading and URL pattern matching for seamless integration.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: internals
- Published: 2026-06-07

---

**The career-ops scanner dynamically loads provider modules from the `providers/` directory, then uses a `detect()` function to match each company's careers URL against known ATS patterns, falling back to explicit provider declarations in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml).**

The `santifer/career-ops` repository automates job tracking by scanning company career pages and aggregating listings into a unified pipeline. Understanding how the portal scanner detects and loads ATS providers is essential for extending support to new applicant tracking systems or debugging why a specific company's jobs fail to appear.

## Provider Architecture Overview

The scanner implements a **plugin-style provider system** that decouples detection logic from the core orchestration. Instead of hard-coding ATS endpoints, `scan.mjs` treats Greenhouse, Lever, Ashby, and other systems as interchangeable modules. Each provider lives as a separate ECMAScript module under `providers/` and exposes three core properties:

- **`id`** – A unique string identifier (e.g., `greenhouse`, `ashby`).
- **`detect(entry)`** – A function that examines the company's configuration and returns a truthy hit object if the ATS is recognized.
- **`fetch(entry, ctx)`** – An async function that retrieves job listings and normalizes them into a standard format.

This architecture allows the scanner to support new ATS platforms by simply dropping a new `.mjs` file into the directory without modifying `scan.mjs`.

## Loading Provider Modules at Startup

When the scan begins, `scan.mjs` calls `loadProviders()` (lines 54–81) to discover and import every available provider:

```javascript
async function loadProviders(dir) {
  const providers = new Map();
  if (!existsSync(dir)) return providers;
  const entries = readdirSync(dir)
    .filter(f => f.endsWith('.mjs') && !f.startsWith('_'))
    .sort();                       // deterministic order
  for (const file of entries) {
    const full = path.join(dir, file);
    const mod = await import(pathToFileURL(full).href);
    const p = mod.default;
    if (!p || typeof p.fetch !== 'function' || !p.id) continue;
    providers.set(p.id, p);
  }
  return providers;
}

```

The loader skips any file prefixed with an underscore (allowing for shared utilities like `providers/_http.mjs`) and sorts the remaining files alphabetically to ensure **deterministic resolution order**. After validation, each provider is stored in a `Map` keyed by its `id`, making subsequent lookups O(1).

## Automatic Provider Detection

For each company entry defined in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml), `scan.mjs` invokes `resolveProvider()` (lines 84–118) to determine which module should handle the request:

```javascript
function resolveProvider(entry, providers, { skipIds = [] } = {}) {
  // 1️⃣ explicit provider field
  if (entry.provider) {
    const p = providers.get(entry.provider);
    if (p) return { provider: p };
  }

  // 2️⃣ run each provider’s detect()
  for (const p of providers.values()) {
    if (skipIds.includes(p.id)) continue;
    const hit = p.detect?.(entry);
    if (hit) return { provider: p, hit };
  }
  return null;
}

```

The resolution strategy follows a priority chain:

1. **Explicit declaration** – If [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml) contains `provider: greenhouse`, that module is selected immediately.
2. **Pattern detection** – Otherwise, the scanner iterates through the loaded providers in alphabetical order and calls `detect(entry)`. The first provider returning a non-null object wins.

This design allows automatic detection of Greenhouse boards even when the configuration entry only contains a `careers_url` and no explicit `provider` field.

## How Greenhouse Is Detected

In `providers/greenhouse.mjs`, the `detect()` function relies on a regex pattern match against the `careers_url`:

```javascript
function resolveApiUrl(entry) {
  if (entry.api) return entry.api;                 // explicit API URL
  const url = entry.careers_url || '';
  const match = url.match(/job-boards(?:\.eu)?\.greenhouse\.io\/([^/?#]+)/);
  if (match) return `https://boards-api.greenhouse.io/v1/boards/${match[1]}/jobs`;
  return null;
}

export default {
  id: 'greenhouse',
  detect(entry) {
    try {
      const apiUrl = resolveApiUrl(entry);
      return apiUrl ? { url: apiUrl } : null;
    } catch { return null; }
  },
  async fetch(entry, ctx) {
    const apiUrl = resolveApiUrl(entry);
    const json = await ctx.fetchJson(apiUrl, { redirect: 'error' });
    return json.jobs
      .filter(j => j.absolute_url)
      .map(j => ({
        title: j.title || '',
        url: j.absolute_url,
        company: entry.name,
        location: j.location?.name || '',
      }));
  },
};

```

The regular expression captures the board slug from URLs like `https://job-boards.greenhouse.io/acmeinc` and constructs the public **boards-api** endpoint. The `fetch()` method then queries this JSON API, filters out entries missing URLs, and normalizes the response into the standard schema expected by the pipeline.

## How Ashby Is Detected

The Ashby provider (`providers/ashby.mjs`) follows a similar pattern but includes resilience logic for rate limiting:

```javascript
function resolveApiUrl(entry) {
  const url = entry.careers_url || '';
  const match = url.match(/jobs\.ashbyhq\.com\/([^/?#]+)/);
  return match ? `https://api.ashbyhq.com/posting-api/job-board/${match[1]}?includeCompensation=true` : null;
}

export default {
  id: 'ashby',
  detect(entry) {
    const apiUrl = resolveApiUrl(entry);
    return apiUrl ? { url: apiUrl } : null;
  },
  async fetch(entry, ctx) {
    const apiUrl = resolveApiUrl(entry);
    // retry with exponential back‑off because Ashby is slow / rate‑limited
    for (let attempt = 0; attempt <= ASHBY_RETRIES; attempt++) {
      try {
        const json = await ctx.fetchJson(apiUrl, { timeoutMs: ASHBY_TIMEOUT_MS });
        return json.jobs.map(j => ({
          title: j.title || '',
          url: j.jobUrl || '',
          company: entry.name,
          location: j.location || '',
        }));
      } catch (e) { /* retry on failure */ }
    }
    throw lastErr;
  },
};

```

The `detect()` function extracts the organization slug from `jobs.ashbyhq.com` URLs. The `fetch()` implementation wraps the HTTP call in a retry loop with exponential backoff, using constants `ASHBY_RETRIES` and `ASHBY_TIMEOUT_MS` to handle Ashby's stricter rate limits.

## How Lever Is Detected

The Lever provider (`providers/lever.mjs`) implements an identical interface. Its `detect()` function pattern-matches `lever.co` domains to identify Lever-hosted job boards, and `fetch()` calls Lever's public JSON feed, returning jobs in the same normalized `{title, url, company, location}` structure.

## End-to-End Scanning Flow

The complete detection and loading lifecycle operates as follows:

1. **Bootstrap** – `scan.mjs` executes `loadProviders()` to populate a `Map` of valid provider modules.
2. **Iteration** – For each company in `tracked_companies`, `resolveProvider()` checks for an explicit `provider` key or runs `detect()` across all modules.
3. **Fetching** – The winning provider's `fetch(entry, ctx)` method is invoked, passing a context object containing HTTP utilities like `ctx.fetchJson()`.
4. **Aggregation** – Returned job objects are deduplicated, filtered by title and location, and written to [`pipeline.md`](https://github.com/santifer/career-ops/blob/main/pipeline.md) and `scan-history.tsv`.

## Summary

- **Dynamic Loading** – Provider modules are auto-discovered from `providers/` at runtime, skipping files prefixed with `_` and validating the presence of `id` and `fetch`.
- **Dual Resolution** – The scanner respects explicit `provider` declarations in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml), otherwise falling back to automatic detection via regex-based `detect()` functions.
- **Pattern Matching** – Greenhouse targets `job-boards.greenhouse.io`, Ashby targets `jobs.ashbyhq.com`, and Lever targets `lever.co` URLs to extract API endpoints.
- **Normalization** – Every provider transforms proprietary JSON schemas into a uniform job object containing `title`, `url`, `company`, and `location`.
- **Extensibility** – Adding support for a new ATS requires only a single file exporting `id`, `detect()`, and `fetch()`; no changes to `scan.mjs` are necessary.

## Frequently Asked Questions

### How do I force the scanner to use a specific ATS provider for a company?

Add the `provider` key to the company's entry in [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml). For example, setting `provider: greenhouse` causes `resolveProvider()` to skip automatic detection and load `providers/greenhouse.mjs` immediately. This override is useful when a company's careers URL is ambiguous or when testing a new provider implementation.

### Why does Ashby scanning take longer than Greenhouse?

According to the source code in `providers/ashby.mjs`, the Ashby implementation includes a retry loop with exponential backoff using `ASHBY_RETRIES` and `ASHBY_TIMEOUT_MS` constants. This compensates for Ashby's stricter rate limits and slower API responses, whereas Greenhouse fetches proceed with a single `fetchJson` call and `redirect: 'error'` handling.

### Can I add support for a custom or internal ATS?

Yes. Create a new file `providers/custom.mjs` that exports a default object with three properties: a unique `id` string, a `detect(entry)` function returning a hit object when the careers URL matches your pattern, and an async `fetch(entry, ctx)` function returning an array of standardized job objects. The scanner will automatically import and evaluate your module on the next run.

### What happens if no provider detects a company?

If `resolveProvider()` exhausts all providers without a match and no explicit provider is configured, it returns `null`. The scanner logs this condition and skips the company for that run, ensuring that unrecognized URLs do not cause fatal errors or corrupt the output pipeline.