# How analyze-patterns.mjs Identifies Rejection Patterns from Historical Application Data

> Discover how analyze-patterns.mjs identifies rejection patterns by extracting, normalizing, and aggregating blocker data from historical application reports.

- Repository: [Santiago Fernández de Valderrama/career-ops](https://github.com/santifer/career-ops)
- Tags: how-to-guide
- Published: 2026-06-07

---

**The `analyze-patterns.mjs` script parses the [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) tracker and every linked markdown report to extract, normalize, and aggregate blocker data, then surfaces the most frequent rejection reasons through a multi-stage pipeline.**

The **`analyze-patterns.mjs`** script in the **`santifer/career-ops`** repository transforms raw job-search history into actionable intelligence. By reading the central application tracker and individual markdown reports, it runs a structured pipeline to **identify rejection patterns from historical application data**. The output is a ranked list of blocker categories—such as geo-restrictions or stack mismatches—paired with concrete recommendations to refine your search strategy.

## Loading Historical Application Data

The pipeline begins by locating the master tracker. The script reads **[`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md)**, falling back to an **[`applications.md`](https://github.com/santifer/career-ops/blob/main/applications.md)** file at the repository root if the primary path is missing.

### Parsing the applications.md Tracker

Inside **`parseTracker`** (lines 53-70), the script scans every line that begins with `|`. It splits the classic markdown table into columns such as `num`, `date`, `company`, `role`, `score`, and `status`.

```javascript
// analyze-patterns.mjs (lines 53-70)
function parseTracker(content) {
  return content
    .split('\n')
    .filter(line => line.trim().startsWith('|'))
    .slice(2) // skip header and separator
    .map(line => {
      const cols = line.split('|').map(c => c.trim());
      return {
        num: cols[1],
        date: cols[2],
        company: cols[3],
        role: cols[4],
        score: cols[5],
        status: cols[6]
      };
    });
}

```

### Resolving Linked Report Files

For each tracker entry, the script resolves the embedded markdown link to its corresponding report file under **`reports/…md`**. It then reads the file contents into memory for downstream parsing. This file-read logic lives near **`parseReport`** (lines 73-78).

## Normalizing Statuses and Classifying Outcomes

Before analyzing blockers, the script standardizes messy human-entered statuses into a strict taxonomy. This ensures that variants such as “Rechazado” and “rejected” are treated as the same canonical signal.

### Mapping Raw Statuses with ALIASES

The **`normalizeStatus`** function (lines 54-68) lower-cases the raw string, strips markdown formatting, and maps it through the **`ALIASES`** table to a canonical value such as `evaluated`, `applied`, `rejected`, or `skip`.

```javascript
// analyze-patterns.mjs (lines 54-68)
const ALIASES = {
  'rechazado': 'rejected',
  'applied': 'applied',
  'skip': 'skip',
  'evaluated': 'evaluated'
};

function normalizeStatus(raw) {
  const clean = raw.toLowerCase().replace(/[*_]/g, '');
  return ALIASES[clean] || clean;
}

```

### Grouping Applications by Outcome

The **`classifyOutcome`** function (lines 75-81) places each canonical status into one of four outcome buckets:

- **positive** – `interview`, `offer`, `responded`, `applied`
- **negative** – `rejected`, `discarded`
- **self_filtered** – `skip`
- **pending** – still being evaluated

This classification lets the script isolate **negative** outcomes when it later counts rejection blockers.

## Extracting Structured Data from Markdown Reports

Once the tracker and outcomes are loaded, the script pulls structured signal from each markdown report. It prefers an explicit machine-readable summary and degrades gracefully to regex-based extraction when that summary is absent.

### Machine Summary Block Parsing

The **`parseMachineSummary`** function (lines 96-108) looks for a fenced YAML or JSON block under a `## Machine Summary` heading. It parses the block with **`js-yaml`** and retains only the fields declared in **`MACHINE_SUMMARY_FIELDS`**, such as `company`, `role`, `score`, `hard_stops`, and `soft_gaps`. This gives the script a clean, structured record when the user has provided one.

```javascript
// analyze-patterns.mjs (lines 96-108)
import yaml from 'js-yaml';

const MACHINE_SUMMARY_FIELDS = [
  'company', 'role', 'score', 'hard_stops', 'soft_gaps'
];

function parseMachineSummary(reportContent) {
  const match = reportContent.match(
    /## Machine Summary\s*```(?:yaml|json)\n([\s\S]*?)\n```/

  );
  if (!match) return null;
  const parsed = yaml.load(match[1]);
  return Object.fromEntries(
    MACHINE_SUMMARY_FIELDS.map(f => [f, parsed[f]])
  );
}

```

### Fallback Regex Extraction

When a report lacks a **Machine Summary**, the script falls back to regex-based parsing of the classic markdown blocks (lines 27-60). This captures Block A metadata, the scoring table, and the gaps table, extracting archetype, seniority, remote policy, team size, compensation, domain, scores such as `cvMatch` and `global`, and the full gap list.

## Identifying Rejection Patterns from Historical Application Data

With structured gaps in hand, the script focuses on its core task: to **identify rejection patterns from historical application data** by isolating the hard stops that drove negative outcomes. This phase transforms qualitative report text into a quantitative blockage histogram.

### Categorizing Hard-Stop Gaps

The **`extractBlockerType`** function (lines 33-41) examines every gap entry. Soft gaps are ignored, while hard-stop descriptions are matched against keyword patterns to produce a discrete blocker category. The supported categories are `geo-restriction`, `stack-mismatch`, `seniority-mismatch`, `onsite-requirement`, and `other`.

```javascript
// analyze-patterns.mjs (lines 33-41)
function extractBlockerType(gapDescription) {
  const desc = gapDescription.toLowerCase();
  if (desc.includes('geo') || desc.includes('location'))
    return 'geo-restriction';
  if (desc.includes('stack') || desc.includes('tech'))
    return 'stack-mismatch';
  if (desc.includes('senior'))
    return 'seniority-mismatch';
  if (desc.includes('onsite') || desc.includes('office'))
    return 'onsite-requirement';
  return 'other';
}

```

### Aggregating Blocker Frequencies

Across all reports, blocker categories are tallied into a **`blockerCounts`** histogram (lines 34-45). The script then normalizes these counts into frequency and percentage values, producing a ranked list of `{ blocker, frequency, percentage }` records. This makes it trivial to see which rejection patterns dominate the dataset.

```javascript
// analyze-patterns.mjs (lines 34-45)
const blockerCounts = {};

reports.forEach(report => {
  report.gaps?.forEach(gap => {
    if (gap.type === 'hard_stop') {
      const blocker = extractBlockerType(gap.description);
      blockerCounts[blocker] = (blockerCounts[blocker] || 0) + 1;
    }
  });
});

const total = Object.values(blockerCounts).reduce((a, b) => a + b, 0);
const frequencyTable = Object.entries(blockerCounts)
  .map(([blocker, count]) => ({
    blocker,
    frequency: count,
    percentage: ((count / total) * 100).toFixed(1)
  }));

```

## Insights and Recommendations

Beyond raw blocker counts, the script segments data to uncover situational trends. Higher-level classification and threshold logic turn the aggregated numbers into concrete next steps.

### Contextual Segmentation

Helper functions such as **`classifyRemote`** and **`classifyCompanySize`** (lines 102-130) group applications by remote policy, company size, and archetype. The script then computes conversion rates, score statistics, and a recommended minimum score threshold.

### Threshold-Based Recommendations

If a blocker type exceeds a specific percentage threshold—such as **≥ 20 % geo-restriction** or **≥ 15 % stack-mismatch**—the script emits a concrete recommendation (lines 120-158). These suggestions point directly to actions like tightening filters in **[`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml)** or raising the minimum application score.

```javascript
// analyze-patterns.mjs (lines 120-158)
function generateRecommendations(frequencyTable) {
  const recommendations = [];
  frequencyTable.forEach(({ blocker, percentage }) => {
    if (blocker === 'geo-restriction' && percentage >= 20) {
      recommendations.push(
        'Tighten portals.yml filters to exclude geo-restricted regions.'
      );
    }
    if (blocker === 'stack-mismatch' && percentage >= 15) {
      recommendations.push(
        'Raise the minimum score threshold or refine CV keywords.'
      );
    }
  });
  return recommendations;
}

```

## Summary

- `analyze-patterns.mjs` reads [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) and every linked `reports/…md` file to build a complete historical dataset.
- Raw statuses are normalized via `normalizeStatus` and `ALIASES`, then classified into outcome buckets by `classifyOutcome`.
- Structured data is pulled from fenced **Machine Summary** blocks when available; otherwise regex fallbacks parse classic markdown sections.
- Hard-stop gaps are categorized by `extractBlockerType` into blockers such as `geo-restriction` and `stack-mismatch`.
- Blocker frequencies are aggregated into a percentage-ranked histogram.
- Threshold logic in the recommendation engine suggests concrete changes to [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml), score cutoffs, or CV focus areas.

## Frequently Asked Questions

### What primary data source does analyze-patterns.mjs read?

The script targets [`data/applications.md`](https://github.com/santifer/career-ops/blob/main/data/applications.md) inside the repository, falling back to an [`applications.md`](https://github.com/santifer/career-ops/blob/main/applications.md) file at the repo root if the primary tracker is missing. Each row that starts with `|` is parsed into a structured entry by the `parseTracker` function.

### How does the script handle reports without a Machine Summary?

When a report lacks a fenced YAML or JSON block under `## Machine Summary`, `analyze-patterns.mjs` falls back to regex-based extraction of the classic markdown blocks (lines 27-60). This captures the scoring table, gap list, and metadata such as archetype, seniority, and remote policy.

### Why does the analysis ignore soft gaps?

The `extractBlockerType` logic (lines 33-41) intentionally filters for **hard-stop** gaps only. Soft gaps represent negotiable concerns rather than definitive rejection reasons, so excluding them keeps the blocker histogram focused on the true drivers of negative outcomes.

### How does the script decide which recommendations to show?

Recommendations are generated when a blocker crosses a percentage threshold—for example, **≥ 20 % geo-restriction** or **≥ 15 % stack-mismatch** (lines 120-158). When a threshold is breached, the script emits a specific action such as tightening [`portals.yml`](https://github.com/santifer/career-ops/blob/main/portals.yml) filters or raising the minimum application score.