How analyze-patterns.mjs Identifies Rejection Patterns from Historical Application Data

The analyze-patterns.mjs script parses the data/applications.md tracker and every linked markdown report to extract, normalize, and aggregate blocker data, then surfaces the most frequent rejection reasons through a multi-stage pipeline.

The analyze-patterns.mjs script in the santifer/career-ops repository transforms raw job-search history into actionable intelligence. By reading the central application tracker and individual markdown reports, it runs a structured pipeline to identify rejection patterns from historical application data. The output is a ranked list of blocker categories—such as geo-restrictions or stack mismatches—paired with concrete recommendations to refine your search strategy.

Loading Historical Application Data

The pipeline begins by locating the master tracker. The script reads data/applications.md, falling back to an applications.md file at the repository root if the primary path is missing.

Parsing the applications.md Tracker

Inside parseTracker (lines 53-70), the script scans every line that begins with |. It splits the classic markdown table into columns such as num, date, company, role, score, and status.

// analyze-patterns.mjs (lines 53-70)
function parseTracker(content) {
  return content
    .split('\n')
    .filter(line => line.trim().startsWith('|'))
    .slice(2) // skip header and separator
    .map(line => {
      const cols = line.split('|').map(c => c.trim());
      return {
        num: cols[1],
        date: cols[2],
        company: cols[3],
        role: cols[4],
        score: cols[5],
        status: cols[6]
      };
    });
}

Resolving Linked Report Files

For each tracker entry, the script resolves the embedded markdown link to its corresponding report file under reports/…md. It then reads the file contents into memory for downstream parsing. This file-read logic lives near parseReport (lines 73-78).

Normalizing Statuses and Classifying Outcomes

Before analyzing blockers, the script standardizes messy human-entered statuses into a strict taxonomy. This ensures that variants such as “Rechazado” and “rejected” are treated as the same canonical signal.

Mapping Raw Statuses with ALIASES

The normalizeStatus function (lines 54-68) lower-cases the raw string, strips markdown formatting, and maps it through the ALIASES table to a canonical value such as evaluated, applied, rejected, or skip.

// analyze-patterns.mjs (lines 54-68)
const ALIASES = {
  'rechazado': 'rejected',
  'applied': 'applied',
  'skip': 'skip',
  'evaluated': 'evaluated'
};

function normalizeStatus(raw) {
  const clean = raw.toLowerCase().replace(/[*_]/g, '');
  return ALIASES[clean] || clean;
}

Grouping Applications by Outcome

The classifyOutcome function (lines 75-81) places each canonical status into one of four outcome buckets:

  • positiveinterview, offer, responded, applied
  • negativerejected, discarded
  • self_filteredskip
  • pending – still being evaluated

This classification lets the script isolate negative outcomes when it later counts rejection blockers.

Extracting Structured Data from Markdown Reports

Once the tracker and outcomes are loaded, the script pulls structured signal from each markdown report. It prefers an explicit machine-readable summary and degrades gracefully to regex-based extraction when that summary is absent.

Machine Summary Block Parsing

The parseMachineSummary function (lines 96-108) looks for a fenced YAML or JSON block under a ## Machine Summary heading. It parses the block with js-yaml and retains only the fields declared in MACHINE_SUMMARY_FIELDS, such as company, role, score, hard_stops, and soft_gaps. This gives the script a clean, structured record when the user has provided one.

// analyze-patterns.mjs (lines 96-108)
import yaml from 'js-yaml';

const MACHINE_SUMMARY_FIELDS = [
  'company', 'role', 'score', 'hard_stops', 'soft_gaps'
];

function parseMachineSummary(reportContent) {
  const match = reportContent.match(
    /## Machine Summary\s*```(?:yaml|json)\n([\s\S]*?)\n```/

  );
  if (!match) return null;
  const parsed = yaml.load(match[1]);
  return Object.fromEntries(
    MACHINE_SUMMARY_FIELDS.map(f => [f, parsed[f]])
  );
}

Fallback Regex Extraction

When a report lacks a Machine Summary, the script falls back to regex-based parsing of the classic markdown blocks (lines 27-60). This captures Block A metadata, the scoring table, and the gaps table, extracting archetype, seniority, remote policy, team size, compensation, domain, scores such as cvMatch and global, and the full gap list.

Identifying Rejection Patterns from Historical Application Data

With structured gaps in hand, the script focuses on its core task: to identify rejection patterns from historical application data by isolating the hard stops that drove negative outcomes. This phase transforms qualitative report text into a quantitative blockage histogram.

Categorizing Hard-Stop Gaps

The extractBlockerType function (lines 33-41) examines every gap entry. Soft gaps are ignored, while hard-stop descriptions are matched against keyword patterns to produce a discrete blocker category. The supported categories are geo-restriction, stack-mismatch, seniority-mismatch, onsite-requirement, and other.

// analyze-patterns.mjs (lines 33-41)
function extractBlockerType(gapDescription) {
  const desc = gapDescription.toLowerCase();
  if (desc.includes('geo') || desc.includes('location'))
    return 'geo-restriction';
  if (desc.includes('stack') || desc.includes('tech'))
    return 'stack-mismatch';
  if (desc.includes('senior'))
    return 'seniority-mismatch';
  if (desc.includes('onsite') || desc.includes('office'))
    return 'onsite-requirement';
  return 'other';
}

Aggregating Blocker Frequencies

Across all reports, blocker categories are tallied into a blockerCounts histogram (lines 34-45). The script then normalizes these counts into frequency and percentage values, producing a ranked list of { blocker, frequency, percentage } records. This makes it trivial to see which rejection patterns dominate the dataset.

// analyze-patterns.mjs (lines 34-45)
const blockerCounts = {};

reports.forEach(report => {
  report.gaps?.forEach(gap => {
    if (gap.type === 'hard_stop') {
      const blocker = extractBlockerType(gap.description);
      blockerCounts[blocker] = (blockerCounts[blocker] || 0) + 1;
    }
  });
});

const total = Object.values(blockerCounts).reduce((a, b) => a + b, 0);
const frequencyTable = Object.entries(blockerCounts)
  .map(([blocker, count]) => ({
    blocker,
    frequency: count,
    percentage: ((count / total) * 100).toFixed(1)
  }));

Insights and Recommendations

Beyond raw blocker counts, the script segments data to uncover situational trends. Higher-level classification and threshold logic turn the aggregated numbers into concrete next steps.

Contextual Segmentation

Helper functions such as classifyRemote and classifyCompanySize (lines 102-130) group applications by remote policy, company size, and archetype. The script then computes conversion rates, score statistics, and a recommended minimum score threshold.

Threshold-Based Recommendations

If a blocker type exceeds a specific percentage threshold—such as ≥ 20 % geo-restriction or ≥ 15 % stack-mismatch—the script emits a concrete recommendation (lines 120-158). These suggestions point directly to actions like tightening filters in portals.yml or raising the minimum application score.

// analyze-patterns.mjs (lines 120-158)
function generateRecommendations(frequencyTable) {
  const recommendations = [];
  frequencyTable.forEach(({ blocker, percentage }) => {
    if (blocker === 'geo-restriction' && percentage >= 20) {
      recommendations.push(
        'Tighten portals.yml filters to exclude geo-restricted regions.'
      );
    }
    if (blocker === 'stack-mismatch' && percentage >= 15) {
      recommendations.push(
        'Raise the minimum score threshold or refine CV keywords.'
      );
    }
  });
  return recommendations;
}

Summary

  • analyze-patterns.mjs reads data/applications.md and every linked reports/…md file to build a complete historical dataset.
  • Raw statuses are normalized via normalizeStatus and ALIASES, then classified into outcome buckets by classifyOutcome.
  • Structured data is pulled from fenced Machine Summary blocks when available; otherwise regex fallbacks parse classic markdown sections.
  • Hard-stop gaps are categorized by extractBlockerType into blockers such as geo-restriction and stack-mismatch.
  • Blocker frequencies are aggregated into a percentage-ranked histogram.
  • Threshold logic in the recommendation engine suggests concrete changes to portals.yml, score cutoffs, or CV focus areas.

Frequently Asked Questions

What primary data source does analyze-patterns.mjs read?

The script targets data/applications.md inside the repository, falling back to an applications.md file at the repo root if the primary tracker is missing. Each row that starts with | is parsed into a structured entry by the parseTracker function.

How does the script handle reports without a Machine Summary?

When a report lacks a fenced YAML or JSON block under ## Machine Summary, analyze-patterns.mjs falls back to regex-based extraction of the classic markdown blocks (lines 27-60). This captures the scoring table, gap list, and metadata such as archetype, seniority, and remote policy.

Why does the analysis ignore soft gaps?

The extractBlockerType logic (lines 33-41) intentionally filters for hard-stop gaps only. Soft gaps represent negotiable concerns rather than definitive rejection reasons, so excluding them keeps the blocker histogram focused on the true drivers of negative outcomes.

How does the script decide which recommendations to show?

Recommendations are generated when a blocker crosses a percentage threshold—for example, ≥ 20 % geo-restriction or ≥ 15 % stack-mismatch (lines 120-158). When a threshold is breached, the script emits a specific action such as tightening portals.yml filters or raising the minimum application score.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →