# How the Developer Agent Generates and Applies Code Diffs in SWE-agent

> Learn how the SWE agent Developer generates and applies code diffs using LLM prompts and structured code blocks for efficient file modifications without external libraries.

- Repository: [LangTalks/swe-agent](https://github.com/langtalks/swe-agent)
- Tags: how-to-guide
- Published: 2026-03-05

---

**The Developer agent generates code diffs by prompting an LLM to produce structured `<code_change_request>` blocks containing numbered original snippets and replacement code, then parses line numbers to splice edits directly into files without external diff libraries.**

The [langtalks/swe-agent](https://github.com/langtalks/swe-agent) repository implements an autonomous software engineering agent that modifies source code through a specialized pipeline. Unlike traditional tools that rely on unified diff patches, the Developer agent uses a LangGraph workflow to generate and apply file changes through explicit line-based splicing.

## LangGraph Workflow Architecture

The diff generation pipeline is orchestrated through a stateful LangGraph workflow defined in [`agent/developer/graph.py`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/graph.py). The compiled graph `swe_developer` chains specialized nodes that transform high-level tasks into concrete file modifications.

### Core Workflow Nodes

The workflow progresses through distinct phases connected by conditional edges:

- **`start_implementing`** – Initializes the graph and selects the first atomic task from the task queue
- **`prepare_for_implementation`** – Loads the target file content (or marks it as new) and resets prior research context  
- **`get_clear_implementation_plan_runnable`** – Generates a detailed implementation plan using LLM reasoning
- **`creating_diffs_for_task`** – Extracts structured diff specifications from the LLM or creates new file content
- **`proceed_to_next_atomic_task`** – Advances to the next task or terminates the workflow

The workflow is compiled into the runnable `swe_developer` with configuration tags for versioning and tracing.

## Generating Structured Diff Specifications

The agent produces machine-readable edit instructions through a carefully engineered prompt template that constrains LLM outputs to a specific XML-like format.

### The Prompt Template

Located at [`agent/developer/prompts/create_diff_prompt.md`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/prompts/create_diff_prompt.md), the prompt instructs the LLM to wrap each modification in `<code_change_request>` blocks containing:

- `original_code_snippet` – The exact lines from the current file prefixed with line numbers (e.g., `"123| def foo():"`)
- `edit_code_snippet` – The replacement code without line numbers

This template is loaded into a LangChain runnable using the helper in [`helpers/prompts.py`](https://github.com/langtalks/swe-agent/blob/main/helpers/prompts.py):

```python
extract_diffs_tasks_prompt = markdown_to_prompt_template(
    "agent/developer/prompts/create_diff_prompt.md"
)
extract_diff_runnable = extract_diffs_tasks_prompt | ChatAnthropic(...) | StrOutputParser()

```

### LLM Invocation with Context

Inside the `creating_diffs_for_task` node, the agent invokes the runnable with pre-numbered file content and task context:

```python
diffs_tasks = extract_diff_runnable.invoke({
    "task": current_atomic_task.atomic_task,
    "additional_context": current_atomic_task.additional_context,
    "research": convert_tools_messages_to_ai_and_human(state.atomic_implementation_research),
    "file_path": file_path,
    "file_content": file_content,  # Lines prefixed with "1| ", "2| ", etc.

    "output_format": JsonOutputParser(pydantic_object=Diffs).get_format_instructions()
})

```

The LLM returns a string containing one or more `<code_change_request>` blocks that specify exactly what to change and where.

## Parsing and Applying Code Diffs

Rather than generating unified diff patches, the agent parses the structured blocks and applies edits through direct line-based splicing in Python.

### Extracting Change Blocks

The `creating_diffs_for_task` function (lines 66-90 in [`agent/developer/graph.py`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/graph.py)) uses regular expressions to isolate individual modifications:

```python
blocks = re.findall(
    r"<code_change_request>(.*?)</code_change_request>", 
    diffs_tasks, 
    re.DOTALL
)

for block in blocks:
    match = re.search(
        r"original_code_snippet:\s*(.*?)\s*edit_code_snippet:\s*(.*)",
        block,
        re.DOTALL,
    )
    if match:
        original_code = match.group(1).strip()
        edited_code = match.group(2).strip()

```

### Line Number Resolution

The agent extracts absolute line numbers from the numbered prefixes in the original snippet:

```python
orig_lines = original_code.splitlines()
first_line = int(orig_lines[0].split("|")[0].strip())
last_line = int(orig_lines[-1].split("|")[0].strip())

```

### In-Place File Splicing

Using the extracted indices, the agent performs an in-place replacement by reconstructing the file array:

```python
new_content = (
    file_content.splitlines()[: first_line - 1] +
    edited_code.splitlines() +
    file_content.splitlines()[last_line:]
)

with open(file_path, "w") as f:
    f.write("\n".join(new_content))

```

This approach bypasses external diff libraries entirely (though `diff_match_patch` is imported for potential future use) and ensures precise character-level control over insertion points.

## Handling New File Creation

When the target path does not exist, the agent switches to a separate generation path using [`agent/developer/prompts/implement_new_file_prompt.md`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/prompts/implement_new_file_prompt.md). The `creating_diffs_for_task` node detects missing files and invokes a dedicated runnable to synthesize complete file content:

```python
new_file_content = create_new_file_runnable.invoke({...})

with open(file_path, "w") as file:
    file.write(new_file_content)

```

New files bypass the diff parsing logic entirely, writing the LLM-generated content directly to disk.

## State Management with Pydantic Models

The agent maintains workflow state through the `SoftwareDeveloperState` class defined in [`agent/developer/state.py`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/state.py). This state tracks:

- The current file path and pre-numbered content
- Research messages accumulated during the optional tool-calling loop
- A list of `DiffTask` objects stored in the `Diffs` Pydantic model

The type-safe state enables the LangGraph workflow to remain stateless between executions while preserving context across node transitions from `prepare_for_implementation` through `creating_diffs_for_task`.

## Summary

- The Developer agent uses a LangGraph workflow in [`agent/developer/graph.py`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/graph.py) to orchestrate file modifications through discrete, testable nodes
- Diff generation relies on [`agent/developer/prompts/create_diff_prompt.md`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/prompts/create_diff_prompt.md) to constrain LLM outputs into `<code_change_request>` blocks with numbered original snippets
- Line numbers embedded in the original code (e.g., `"42| def function():"`) enable precise range extraction without fuzzy matching algorithms
- The `creating_diffs_for_task` function parses these blocks and applies edits via Python list splicing: `content[:start-1] + new_lines + content[end:]`
- New files bypass the diff mechanism entirely, using [`implement_new_file_prompt.md`](https://github.com/langtalks/swe-agent/blob/main/implement_new_file_prompt.md) to generate complete content from scratch
- State persistence is handled by the `SoftwareDeveloperState` and `Diffs` Pydantic models in [`agent/developer/state.py`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/state.py)

## Frequently Asked Questions

### How does the Developer agent ensure precise code replacements without using traditional diff algorithms?

The agent requires the LLM to include exact line numbers (prefixed as `"123| "`) in the `original_code_snippet` field. It extracts the first and last line numbers to determine the exact slice indices, then replaces that range directly in the file's line array. This avoids the ambiguity of context-based diff matching by relying on explicit line addressing provided by the LLM.

### What format must the LLM use when generating code changes?

The LLM must output one or more `<code_change_request>` blocks, each containing two labeled sections: `original_code_snippet` with numbered lines from the current file, and `edit_code_snippet` with the replacement code without line numbers. This structured format is strictly enforced by the prompt template in [`agent/developer/prompts/create_diff_prompt.md`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/prompts/create_diff_prompt.md).

### How does the agent handle files that don't exist yet?

If the target file path does not exist, the `creating_diffs_for_task` node detects this condition and switches to the new-file workflow. It invokes a separate prompt ([`agent/developer/prompts/implement_new_file_prompt.md`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/prompts/implement_new_file_prompt.md)) to generate the complete file content, then writes it directly using standard file I/O without parsing diff blocks or extracting line numbers.

### Where is the workflow state stored during diff generation?

State is managed through the `SoftwareDeveloperState` Pydantic model in [`agent/developer/state.py`](https://github.com/langtalks/swe-agent/blob/main/agent/developer/state.py). This includes the current atomic task, numbered file content, research history, and parsed `Diffs` objects. LangGraph handles the state transitions between nodes like `prepare_for_implementation` and `creating_diffs_for_task`, enabling the workflow to resume from any point.