# How the liveness-browser.mjs Classifier Distinguishes Expired from Active Job Postings

> Discover how the liveness-browser.mjs classifier distinguishes expired from active job postings using Playwright, HTTP status codes, regex, and more for accurate results.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: deep-dive
- Published: 2026-06-07

---

**The `liveness-browser.mjs` classifier orchestrates Playwright to scrape job posting pages and delegates to `liveness-core.mjs`, which evaluates HTTP status codes, URL redirect patterns, body text regexes, apply button visibility, and content length thresholds to return a deterministic `expired`, `active`, or `uncertain` verdict.**

The `liveness-browser.mjs` classifier in the `santifer/career-ops` repository automates the verification of job posting URLs by simulating real browser sessions. Understanding the **liveness-browser.mjs classifier logic** is essential for anyone building job board aggregators or ATS monitoring tools that need to filter out stale listings. This article breaks down the exact signals and decision rules used to categorize postings.

## Overview of the Classification Pipeline

The classifier operates as a two-stage process. First, `liveness-browser.mjs` handles browser orchestration, safety checks, and data extraction. Then, it passes structured data to `classifyLiveness()` in `liveness-core.mjs` to execute the rule-based evaluation.

The pipeline extracts four critical data points from each page:

- **HTTP response status** captured during navigation
- **Final URL** after resolving all redirects
- **Body text** content for pattern matching
- **Visible apply controls** (buttons, links, or inputs that are not hidden and not inside navigation or footer elements)

## The Six Signal Detection Rules

According to the source code in `liveness-core.mjs`, the classifier evaluates six distinct signals in priority order. Each signal triggers a specific result code when matched.

### HTTP Status Code Validation

The classifier first inspects the HTTP response status. If the server returns `404` (Not Found) or `410` (Gone), the posting is immediately marked as expired.

**Result:** `{ result: 'expired', code: 'http_gone' }`

### Expired URL Pattern Matching

When job platforms remove listings, they often redirect to generic URLs containing fragments like "expired" or "closed". The classifier checks the final URL against `EXPIRED_URL_PATTERNS`.

**Result:** `{ result: 'expired', code: 'expired_url' }`

### Hard Expired Text Patterns

The classifier scans `bodyText` for definitive expiration phrases defined in `HARD_EXPIRED_PATTERNS`. These are hard-coded strings indicating the position is no longer available.

**Result:** `{ result: 'expired', code: 'expired_body' }`

### Apply Control Visibility

For active postings, the classifier looks for visible application elements. It filters the DOM for buttons, links, or inputs that:

- Are not hidden via CSS
- Have positive dimensions (width and height)
- Are not contained within `<nav>` or `<footer>` elements

If any collected `applyControls` match `APPLY_PATTERNS`, the posting is considered active.

**Result:** `{ result: 'active', code: 'apply_control_visible' }`

### Listing Page Detection

Some Applicant Tracking Systems redirect removed postings to generic search results pages rather than showing a 404. The classifier identifies these by matching the final URL against `LISTING_PAGE_PATTERNS`.

**Result:** `{ result: 'expired', code: 'listing_page' }`

### Content Length Validation

If `bodyText` contains fewer than `MIN_CONTENT_CHARS` (300 characters), the page is likely just a navigation skeleton or error template without actual job content.

**Result:** `{ result: 'expired', code: 'insufficient_content' }`

### Fallback for Uncertain States

When none of the above signals trigger—meaning no apply controls are visible but no expiration signals are detected—the classifier reports uncertainty rather than guessing.

**Result:** `{ result: 'uncertain', code: 'no_apply_control' }`

## Implementation in liveness-browser.mjs

The `liveness-browser.mjs` module manages the browser lifecycle and data preparation before classification.

### Safety Guards

Before navigation, the `rejectPrivateOrInvalid()` function blocks non-HTTP(S) protocols and private network hosts (localhost, `127.0.0.1`, `192.168.x.x`, etc.), preventing the scanner from hitting internal infrastructure.

### Navigation Strategy

The module uses Playwright to load pages with a **15-second timeout**, then waits an additional **2 seconds** to allow for Single Page Application hydration. This ensures JavaScript-rendered content is fully loaded before extraction.

### Data Extraction

After navigation completes, the script captures:

- `response.status()` for HTTP code analysis
- `page.url()` for redirect tracking
- `bodyText` via page content extraction
- `applyControls` via DOM querying that excludes hidden elements and navigation or footer containers

All collected data is packaged into an object and passed to `classifyLiveness()`.

## Using the Classifier in Your Code

You can integrate the classifier into existing pipelines using either the high-level browser wrapper or the core logic directly.

### Checking a URL with Browser Automation

Use `checkUrlLiveness()` from `liveness-browser.mjs` when you need full browser rendering:

```javascript
import { chromium } from 'playwright';
import { checkUrlLiveness } from './liveness-browser.mjs';

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  const url = 'https://jobs.example.com/role-software-engineer-12345';
  const { result, code, reason } = await checkUrlLiveness(page, url);

  console.log(`Liveness: ${result} (code: ${code}) – ${reason}`);
  await browser.close();
})();

```

### Direct Core Classification

For unit testing or server-side rendering scenarios where you already have page data, import `classifyLiveness()` from `liveness-core.mjs`:

```javascript
import { classifyLiveness } from './liveness-core.mjs';

const sample = {
  status: 200,
  finalUrl: 'https://jobs.example.com/role-software-engineer-12345',
  bodyText: '...apply now...',
  applyControls: [{ text: 'Apply Now', tag: 'button' }]
};

const { result, code, reason } = classifyLiveness(sample);
// Output: "active", "apply_control_visible", "visible apply control detected"

```

## Summary

- **The `liveness-browser.mjs` classifier** uses Playwright to gather page data and delegates evaluation to `liveness-core.mjs`.
- **Six primary signals** determine status: HTTP status codes (`404`/`410`), expired URL patterns, hard-coded expiration phrases, visible apply controls, listing page redirects, and minimum content length (300 characters).
- **Safety measures** include private network blocking via `rejectPrivateOrInvalid()` and SPA hydration waits (15s navigation + 2s wait).
- **Return values** are standardized objects containing `result` (`expired`, `active`, or `uncertain`), `code` (specific trigger identifier), and `reason` (human-readable explanation).
- **Integration options** include high-level `checkUrlLiveness()` for live URLs or direct `classifyLiveness()` for testing with static data.

## Frequently Asked Questions

### How does liveness-browser.mjs handle Single Page Applications?

The classifier accommodates SPAs by waiting **2 seconds** after the initial navigation completes (which uses a 15-second timeout). This hydration delay allows JavaScript frameworks to render job content and apply buttons that would not exist in the initial HTML payload.

### What is the minimum content length threshold, and why?

The classifier requires `bodyText` to contain at least **`MIN_CONTENT_CHARS` (300 characters)**. Pages shorter than this threshold typically represent navigation shells, error pages, or generic redirects without actual job descriptions, triggering the `insufficient_content` expiration code.

### Can I use the classifier without running a full browser?

Yes. While `liveness-browser.mjs` requires Playwright for live URL checking, you can import `classifyLiveness()` directly from `liveness-core.mjs` to classify pre-scraped data. This is useful for unit testing (as demonstrated in `test-all.mjs`) or when integrating with existing crawling infrastructure that already provides page content, status codes, and final URLs.

### Why does the classifier return "uncertain" instead of defaulting to active or expired?

The `uncertain` result with code `no_apply_control` acts as a safety mechanism when no apply buttons are detected but no explicit expiration signals are present. This prevents false positives on pages with complex authentication requirements, heavy JavaScript that obscures controls, or unconventional ATS layouts that might hide apply elements until interaction, prompting manual review rather than automatic categorization.