# How the SWE-Agent Architect Validates Research Hypotheses During the Planning Phase

> Learn how the SWE-Agent Architect validates research hypotheses with an evidence-driven pipeline and scoring mechanism. Ensure high-confidence candidates for your final plan.

- Repository: [LangTalks/swe-agent](https://github.com/langtalks/swe-agent)
- Tags: architecture
- Published: 2026-03-05

---

**The SWE-Agent Architect validates research hypotheses by orchestrating an evidence-driven pipeline that generates candidate explanations, retrieves concrete codebase evidence via the Researcher agent, and scores each hypothesis using the `validate_hypothesis` helper to ensure only high-confidence candidates with a score ≥ 0.75 are incorporated into the final plan.**

The Architect agent serves as the top-level planner in the [langtalks/swe-agent](https://github.com/langtalks/swe-agent) system, transforming high-level software engineering tasks into concrete, step-by-step execution plans. A critical component of this planning phase is ensuring that every research hypothesis is grounded in observable codebase data rather than speculative LLM output. This article examines the validation loop implemented in the Architect agent to filter and refine hypotheses before execution.

## The Six-Stage Hypothesis Validation Workflow

The Architect agent implements a rigorous, multi-stage validation process to ensure plan reliability. This workflow is orchestrated through the core logic in [`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py) and specialized scoring utilities in [`agents/architect/validation.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/validation.py).

### 1. Hypothesis Generation via Prompt Templates

The validation process begins with the Architect using structured prompt templates defined in [`agents/architect/prompts.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/prompts.py) to request the LLM generate one or more hypotheses. These hypotheses propose root causes for the issue, identify relevant code sections requiring examination, and suggest potential design changes. Each generated hypothesis receives a unique identifier for tracking throughout the planning pipeline.

### 2. Evidence Retrieval Through the Researcher Agent

Before accepting any hypothesis, the Architect invokes the **Researcher** agent via the internal `research` method in [`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py). This agent performs deep codebase analysis to gather concrete *research results*, including file snippets, error traces, relevant commit messages, and documentation references. The Researcher agent implementation in [`agents/researcher/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/researcher/agent.py) ensures the Architect receives factual, observable data rather than theoretical assumptions.

### 3. Scoring and Consistency Validation

Each hypothesis is evaluated against the retrieved evidence using the `validate_hypothesis` function in [`agents/architect/validation.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/validation.py). The scoring algorithm assesses three critical dimensions:

- **Presence** – Verifies whether the evidence contains the specific symbols, imports, or error patterns mentioned in the hypothesis.
- **Relevance** – Checks temporal and logical proximity to the failure, such as recent changes in the same module.
- **Confidence** – Combines the LLM’s self-reported confidence with a heuristic match-ratio to produce a final validation score between 0.0 and 1.0.

### 4. Threshold-Based Acceptance and Refinement

Hypotheses achieving a validation score **≥ 0.75** are marked as *validated* and proceed to plan enrichment. Those falling below this threshold are either discarded entirely or routed through the `refine_hypothesis` loop in [`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py), where the Architect requests the LLM to reformulate the hypothesis based on the contradictory evidence gathered.

### 5. Plan Enrichment with Validation Metadata

Validated hypotheses are integrated into the execution plan as explicit preconditions. The Architect annotates each `PlanStep`—defined in [`agents/architect/types.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/types.py)—with the corresponding `hypothesis_id`, creating an audit trail that allows downstream agents (Synthesizer, Tester, etc.) to reference the original justification for every planned action.

### 6. Execution Feedback and Adaptive Replanning

After initial plan execution, the **Tester** agent reports any failures back to the Architect. The Architect then re-runs the validation pipeline on remaining or newly-generated hypotheses, enabling the system to adapt dynamically if earlier assumptions prove incorrect. This feedback loop ensures the planning phase remains responsive to real-world execution results.

## Core Implementation Files and Functions

Understanding the validation architecture requires familiarity with these specific source files:

- **[`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py)** – Contains the `ArchitectAgent` class with the `generate_hypotheses`, `research`, and `refine_hypothesis` methods that orchestrate the validation workflow.
- **[`agents/architect/validation.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/validation.py)** – Implements the `validate_hypothesis` scoring function and associated heuristics for comparing hypotheses against evidence.
- **[`agents/architect/types.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/types.py)** – Defines the `Hypothesis` and `PlanStep` dataclasses that carry validation metadata through the planning pipeline.
- **[`agents/researcher/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/researcher/agent.py)** – Provides the evidence-gathering service that feeds the Architect's validation logic.
- **[`agents/architect/prompts.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/prompts.py)** – Houses the prompt templates used to solicit hypotheses and refinements from the underlying LLM.

## Practical Example: Validating a Hypothesis in Code

The following example demonstrates how to instantiate the Architect agent and run a hypothesis through the complete validation pipeline:

```python

# Example: How the Architect validates a hypothesis

from agents.architect.agent import ArchitectAgent

architect = ArchitectAgent()
task_description = "Fix the failing unit test in `utils/math.py`"

# 1. Generate hypotheses

hypotheses = architect.generate_hypotheses(task_description)

# 2. Retrieve supporting evidence

evidence = architect.research(hypotheses)

# 3. Validate each hypothesis

validated = []
for hypo in hypotheses:
    score = architect.validate_hypothesis(hypo, evidence)
    if score >= 0.75:
        validated.append(hypo)

# 4. Build the final plan using only validated hypotheses

plan = architect.build_plan(validated)
print(plan)

```

*Simplified output showing validated plan steps:*

```text
PlanStep(
    description="Inspect the division function in utils/math.py",
    hypothesis_id="hypo_3",   # validated hypothesis: “ZeroDivisionError is caused by missing guard”

)
PlanStep(
    description="Add guard clause and run test suite",
    hypothesis_id="hypo_3",
)

```

## Summary

- The Architect agent validates hypotheses through an **evidence-driven six-stage pipeline** that ensures plans are grounded in concrete codebase data.
- Validation relies on the **Researcher agent** (via the `research` method) to gather file snippets, error traces, and commit history before scoring.
- The **`validate_hypothesis`** function in [`agents/architect/validation.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/validation.py) applies a **0.75 threshold** to filter candidates based on Presence, Relevance, and Confidence metrics.
- Failed hypotheses trigger the **`refine_hypothesis`** loop for iterative improvement rather than immediate rejection.
- Validated hypotheses are embedded into **`PlanStep`** objects via the `hypothesis_id` field, creating traceable links between plan actions and their supporting evidence.

## Frequently Asked Questions

### What validation score threshold does the Architect agent use?

According to the source code in [`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py), the Architect agent applies a hard threshold of **0.75** when evaluating hypothesis validation scores. Hypotheses scoring at or above this value are marked as validated and incorporated into the execution plan, while those below the threshold are either discarded or sent back for refinement.

### How does the Architect agent gather evidence for hypothesis validation?

The Architect delegates evidence gathering to the **Researcher** agent through the internal `research` method defined in [`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py). The Researcher searches the codebase, documentation, and test failure logs to return structured research results containing file snippets, error traces, and relevant commit messages, which the Architect then uses to score hypotheses in [`agents/architect/validation.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/validation.py).

### What happens to hypotheses that fail validation?

Hypotheses with validation scores below 0.75 are handled by the **`refine_hypothesis`** loop in [`agents/architect/agent.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/agent.py). The Architect either discards the low-confidence hypothesis entirely or prompts the LLM to generate a revised version that better aligns with the contradictory evidence gathered by the Researcher agent.

### Which file contains the core validation scoring logic?

The core scoring logic resides in **[`agents/architect/validation.py`](https://github.com/langtalks/swe-agent/blob/main/agents/architect/validation.py)**, which implements the `validate_hypothesis` function. This module contains the heuristic algorithms that check for **Presence**, **Relevance**, and **Confidence** by comparing the symbolic references and error patterns mentioned in each hypothesis against the concrete evidence retrieved from the codebase.