How the Career-Ops Portal Scanner Detects and Loads ATS Providers (Greenhouse, Lever, Ashby)
The career-ops scanner dynamically loads provider modules from the providers/ directory, then uses a detect() function to match each company's careers URL against known ATS patterns, falling back to explicit provider declarations in portals.yml.
The santifer/career-ops repository automates job tracking by scanning company career pages and aggregating listings into a unified pipeline. Understanding how the portal scanner detects and loads ATS providers is essential for extending support to new applicant tracking systems or debugging why a specific company's jobs fail to appear.
Provider Architecture Overview
The scanner implements a plugin-style provider system that decouples detection logic from the core orchestration. Instead of hard-coding ATS endpoints, scan.mjs treats Greenhouse, Lever, Ashby, and other systems as interchangeable modules. Each provider lives as a separate ECMAScript module under providers/ and exposes three core properties:
id– A unique string identifier (e.g.,greenhouse,ashby).detect(entry)– A function that examines the company's configuration and returns a truthy hit object if the ATS is recognized.fetch(entry, ctx)– An async function that retrieves job listings and normalizes them into a standard format.
This architecture allows the scanner to support new ATS platforms by simply dropping a new .mjs file into the directory without modifying scan.mjs.
Loading Provider Modules at Startup
When the scan begins, scan.mjs calls loadProviders() (lines 54–81) to discover and import every available provider:
async function loadProviders(dir) {
const providers = new Map();
if (!existsSync(dir)) return providers;
const entries = readdirSync(dir)
.filter(f => f.endsWith('.mjs') && !f.startsWith('_'))
.sort(); // deterministic order
for (const file of entries) {
const full = path.join(dir, file);
const mod = await import(pathToFileURL(full).href);
const p = mod.default;
if (!p || typeof p.fetch !== 'function' || !p.id) continue;
providers.set(p.id, p);
}
return providers;
}
The loader skips any file prefixed with an underscore (allowing for shared utilities like providers/_http.mjs) and sorts the remaining files alphabetically to ensure deterministic resolution order. After validation, each provider is stored in a Map keyed by its id, making subsequent lookups O(1).
Automatic Provider Detection
For each company entry defined in portals.yml, scan.mjs invokes resolveProvider() (lines 84–118) to determine which module should handle the request:
function resolveProvider(entry, providers, { skipIds = [] } = {}) {
// 1️⃣ explicit provider field
if (entry.provider) {
const p = providers.get(entry.provider);
if (p) return { provider: p };
}
// 2️⃣ run each provider’s detect()
for (const p of providers.values()) {
if (skipIds.includes(p.id)) continue;
const hit = p.detect?.(entry);
if (hit) return { provider: p, hit };
}
return null;
}
The resolution strategy follows a priority chain:
- Explicit declaration – If
portals.ymlcontainsprovider: greenhouse, that module is selected immediately. - Pattern detection – Otherwise, the scanner iterates through the loaded providers in alphabetical order and calls
detect(entry). The first provider returning a non-null object wins.
This design allows automatic detection of Greenhouse boards even when the configuration entry only contains a careers_url and no explicit provider field.
How Greenhouse Is Detected
In providers/greenhouse.mjs, the detect() function relies on a regex pattern match against the careers_url:
function resolveApiUrl(entry) {
if (entry.api) return entry.api; // explicit API URL
const url = entry.careers_url || '';
const match = url.match(/job-boards(?:\.eu)?\.greenhouse\.io\/([^/?#]+)/);
if (match) return `https://boards-api.greenhouse.io/v1/boards/${match[1]}/jobs`;
return null;
}
export default {
id: 'greenhouse',
detect(entry) {
try {
const apiUrl = resolveApiUrl(entry);
return apiUrl ? { url: apiUrl } : null;
} catch { return null; }
},
async fetch(entry, ctx) {
const apiUrl = resolveApiUrl(entry);
const json = await ctx.fetchJson(apiUrl, { redirect: 'error' });
return json.jobs
.filter(j => j.absolute_url)
.map(j => ({
title: j.title || '',
url: j.absolute_url,
company: entry.name,
location: j.location?.name || '',
}));
},
};
The regular expression captures the board slug from URLs like https://job-boards.greenhouse.io/acmeinc and constructs the public boards-api endpoint. The fetch() method then queries this JSON API, filters out entries missing URLs, and normalizes the response into the standard schema expected by the pipeline.
How Ashby Is Detected
The Ashby provider (providers/ashby.mjs) follows a similar pattern but includes resilience logic for rate limiting:
function resolveApiUrl(entry) {
const url = entry.careers_url || '';
const match = url.match(/jobs\.ashbyhq\.com\/([^/?#]+)/);
return match ? `https://api.ashbyhq.com/posting-api/job-board/${match[1]}?includeCompensation=true` : null;
}
export default {
id: 'ashby',
detect(entry) {
const apiUrl = resolveApiUrl(entry);
return apiUrl ? { url: apiUrl } : null;
},
async fetch(entry, ctx) {
const apiUrl = resolveApiUrl(entry);
// retry with exponential back‑off because Ashby is slow / rate‑limited
for (let attempt = 0; attempt <= ASHBY_RETRIES; attempt++) {
try {
const json = await ctx.fetchJson(apiUrl, { timeoutMs: ASHBY_TIMEOUT_MS });
return json.jobs.map(j => ({
title: j.title || '',
url: j.jobUrl || '',
company: entry.name,
location: j.location || '',
}));
} catch (e) { /* retry on failure */ }
}
throw lastErr;
},
};
The detect() function extracts the organization slug from jobs.ashbyhq.com URLs. The fetch() implementation wraps the HTTP call in a retry loop with exponential backoff, using constants ASHBY_RETRIES and ASHBY_TIMEOUT_MS to handle Ashby's stricter rate limits.
How Lever Is Detected
The Lever provider (providers/lever.mjs) implements an identical interface. Its detect() function pattern-matches lever.co domains to identify Lever-hosted job boards, and fetch() calls Lever's public JSON feed, returning jobs in the same normalized {title, url, company, location} structure.
End-to-End Scanning Flow
The complete detection and loading lifecycle operates as follows:
- Bootstrap –
scan.mjsexecutesloadProviders()to populate aMapof valid provider modules. - Iteration – For each company in
tracked_companies,resolveProvider()checks for an explicitproviderkey or runsdetect()across all modules. - Fetching – The winning provider's
fetch(entry, ctx)method is invoked, passing a context object containing HTTP utilities likectx.fetchJson(). - Aggregation – Returned job objects are deduplicated, filtered by title and location, and written to
pipeline.mdandscan-history.tsv.
Summary
- Dynamic Loading – Provider modules are auto-discovered from
providers/at runtime, skipping files prefixed with_and validating the presence ofidandfetch. - Dual Resolution – The scanner respects explicit
providerdeclarations inportals.yml, otherwise falling back to automatic detection via regex-baseddetect()functions. - Pattern Matching – Greenhouse targets
job-boards.greenhouse.io, Ashby targetsjobs.ashbyhq.com, and Lever targetslever.coURLs to extract API endpoints. - Normalization – Every provider transforms proprietary JSON schemas into a uniform job object containing
title,url,company, andlocation. - Extensibility – Adding support for a new ATS requires only a single file exporting
id,detect(), andfetch(); no changes toscan.mjsare necessary.
Frequently Asked Questions
How do I force the scanner to use a specific ATS provider for a company?
Add the provider key to the company's entry in portals.yml. For example, setting provider: greenhouse causes resolveProvider() to skip automatic detection and load providers/greenhouse.mjs immediately. This override is useful when a company's careers URL is ambiguous or when testing a new provider implementation.
Why does Ashby scanning take longer than Greenhouse?
According to the source code in providers/ashby.mjs, the Ashby implementation includes a retry loop with exponential backoff using ASHBY_RETRIES and ASHBY_TIMEOUT_MS constants. This compensates for Ashby's stricter rate limits and slower API responses, whereas Greenhouse fetches proceed with a single fetchJson call and redirect: 'error' handling.
Can I add support for a custom or internal ATS?
Yes. Create a new file providers/custom.mjs that exports a default object with three properties: a unique id string, a detect(entry) function returning a hit object when the careers URL matches your pattern, and an async fetch(entry, ctx) function returning an array of standardized job objects. The scanner will automatically import and evaluate your module on the next run.
What happens if no provider detects a company?
If resolveProvider() exhausts all providers without a match and no explicit provider is configured, it returns null. The scanner logs this condition and skips the company for that run, ensuring that unrecognized URLs do not cause fatal errors or corrupt the output pipeline.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →