# How Playwright Liveness Verification Detects Expired Job Postings in Career-Ops

> Learn how Playwright liveness verification in Career Ops identifies expired job postings. Discover the two-stage pipeline for accurate status detection.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: how-to-guide
- Published: 2026-06-09

---

**Career-Ops uses a two-stage Playwright pipeline that first navigates to the job URL to capture the final redirect, page text, and apply buttons, then applies deterministic classification rules to mark postings as expired, active, or uncertain.**

The open-source Career-Ops project (santifer/career-ops) automates job posting verification using a Playwright-based liveness detection system. This article explains how the codebase distinguishes between active opportunities and expired listings through a combination of network-level heuristics and DOM content analysis.

## The Two-Stage Verification Architecture

The liveness verification system splits detection between browser automation and pure classification logic. This separation allows for both robust page interaction and deterministic rule evaluation without side effects.

### Stage 1: Playwright Navigation and Data Extraction

In `liveness-browser.mjs`, the `checkUrlLiveness` function launches a headless Chromium instance to gather raw page telemetry:

- The final URL after all redirects (`finalUrl`)
- The raw inner text of the document body (`bodyText`)
- A filtered list of visible apply controls (`applyControls`)

Before navigation begins, the `rejectPrivateOrInvalid` guard blocks non-HTTP protocols, localhost, and private network addresses, returning an immediate `invalid_url` or `blocked_host` result for security.

### Stage 2: Deterministic Classification

The `classifyLiveness` function in `liveness-core.mjs` receives the extracted data and applies a priority-ordered rule set. Each evaluation returns a structured object containing `result` (expired/active/uncertain), `code`, and `reason`.

The classification hierarchy evaluates conditions in this exact order:

1. **HTTP Status Validation** – Checks if `status === 404` or `410`, returning **expired** with code `http_gone`.

2. **Redirect URL Analysis** – Matches `finalUrl` against `EXPIRED_URL_PATTERNS` (e.g., `?error=true`), returning **expired** with code `expired_url`.

3. **Body Text Heuristics** – Searches `bodyText` for `HARD_EXPIRED_PATTERNS` like "job no longer available" or "position has been filled", returning **expired** with code `expired_body`.

4. **Apply Control Detection** – If `hasApplyControl(applyControls)` identifies visible application buttons or links, returns **active** with code `apply_control_visible`.

5. **Listing Page Detection** – Matches `LISTING_PAGE_PATTERNS` indicating search results pages (e.g., "12 jobs found"), returning **expired** with code `listing_page`.

6. **Content Sufficiency Check** – Validates `bodyText` length against `MIN_CONTENT_CHARS` (default 300 characters), returning **expired** with code `insufficient_content` if the content is too short (typical of navigation-only pages).

7. **Fallback Classification** – Returns **uncertain** with code `no_apply_control` if no preceding rules match.

## Integration with the Scan Workflow

When running `node scan.mjs --verify`, the `verifyOffers` helper orchestrates the verification batch. It iterates over discovered URLs, invokes `checkUrlLiveness`, and routes results into three categories:

- **Active** – Postings with visible apply controls proceed to the verified pipeline.
- **Expired** – URLs flagged as expired are written to `scan-history.tsv` with status `skipped_expired`.
- **Uncertain** – Results lacking apply controls are treated as `dropped` and logged as `skipped_no_apply_control`, while navigation errors remain uncertain for retry on subsequent scans.

## Security and Reliability Guardrails

The `rejectPrivateOrInvalid` utility in `liveness-browser.mjs` prevents the browser from accessing internal infrastructure by validating URLs before page load. This ensures that private IP ranges and unsupported protocols never reach the classification stage, protecting both the scanning infrastructure and target systems.

## Practical Usage Examples

To check a single URL programmatically:

```javascript
import { chromium } from 'playwright';
import { checkUrlLiveness } from './liveness-browser.mjs';

async function validateJob(url) {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();
  const { result, code, reason } = await checkUrlLiveness(page, url);
  console.log(`${url} → ${result} (${code}): ${reason}`);
  await browser.close();
}

validateJob('https://example.com/job/123');

```

The core classification logic follows this deterministic flow:

```javascript
function classifyLiveness({status, finalUrl, bodyText, applyControls}) {
  if (status === 404 || status === 410) 
    return {result: 'expired', code: 'http_gone'};
  
  if (EXPIRED_URL_PATTERNS.some(p => p.test(finalUrl))) 
    return {result: 'expired', code: 'expired_url'};
  
  if (HARD_EXPIRED_PATTERNS.some(p => p.test(bodyText))) 
    return {result: 'expired', code: 'expired_body'};
  
  if (applyControls.length > 0) 
    return {result: 'active', code: 'apply_control_visible'};
  
  // Additional fallback checks omitted for brevity
  return {result: 'uncertain', code: 'no_apply_control'};
}

```

## Summary

- **Playwright liveness verification** in Career-Ops separates data collection from classification logic across `liveness-browser.mjs` and `liveness-core.mjs`.
- The system prioritizes **early-exit checks** for HTTP errors and expired URL patterns before analyzing page content.
- **Visible apply controls** serve as the primary signal for active postings, ensuring only actionable job links reach users.
- Security guardrails in `rejectPrivateOrInvalid` prevent scanning of private networks and invalid protocols.
- Integration via `verifyOffers` in `scan.mjs` provides automated batch processing with persistent history tracking.

## Frequently Asked Questions

### What triggers the "insufficient_content" classification?

When the extracted `bodyText` contains fewer than 300 characters (configurable via `MIN_CONTENT_CHARS`), the system classifies the posting as expired with code `insufficient_content`. This heuristic catches pages that render only navigation footers or skeleton loaders without actual job details.

### How does the system handle redirects to expired job pages?

The `checkUrlLiveness` function captures the `finalUrl` after all redirects complete. If this final URL matches patterns defined in `EXPIRED_URL_PATTERNS` (such as query parameters containing `error=true`), the classifier immediately returns `expired_url` before inspecting the page body.

### Can the liveness checker distinguish between a filled position and a removed posting?

Yes. The `HARD_EXPIRED_PATTERNS` array includes specific phrases like "position has been filled" and "job no longer available" to differentiate between various expiration states. Both result in an `expired` classification, but with the detailed reason code `expired_body` for audit purposes.

### Why are some URLs classified as "uncertain" rather than expired?

URLs receive `uncertain` status when they load successfully but lack visible apply controls and do not match any hard expiration patterns. These represent ambiguous cases—such as JavaScript-heavy pages with delayed rendering—that the system flags for manual review or retry on subsequent scans rather than risking false negatives.