How the Developer Agent Generates and Applies Code Diffs in SWE-agent

The Developer agent generates code diffs by prompting an LLM to produce structured <code_change_request> blocks containing numbered original snippets and replacement code, then parses line numbers to splice edits directly into files without external diff libraries.

The langtalks/swe-agent repository implements an autonomous software engineering agent that modifies source code through a specialized pipeline. Unlike traditional tools that rely on unified diff patches, the Developer agent uses a LangGraph workflow to generate and apply file changes through explicit line-based splicing.

LangGraph Workflow Architecture

The diff generation pipeline is orchestrated through a stateful LangGraph workflow defined in agent/developer/graph.py. The compiled graph swe_developer chains specialized nodes that transform high-level tasks into concrete file modifications.

Core Workflow Nodes

The workflow progresses through distinct phases connected by conditional edges:

  • start_implementing – Initializes the graph and selects the first atomic task from the task queue
  • prepare_for_implementation – Loads the target file content (or marks it as new) and resets prior research context
  • get_clear_implementation_plan_runnable – Generates a detailed implementation plan using LLM reasoning
  • creating_diffs_for_task – Extracts structured diff specifications from the LLM or creates new file content
  • proceed_to_next_atomic_task – Advances to the next task or terminates the workflow

The workflow is compiled into the runnable swe_developer with configuration tags for versioning and tracing.

Generating Structured Diff Specifications

The agent produces machine-readable edit instructions through a carefully engineered prompt template that constrains LLM outputs to a specific XML-like format.

The Prompt Template

Located at agent/developer/prompts/create_diff_prompt.md, the prompt instructs the LLM to wrap each modification in <code_change_request> blocks containing:

  • original_code_snippet – The exact lines from the current file prefixed with line numbers (e.g., "123| def foo():")
  • edit_code_snippet – The replacement code without line numbers

This template is loaded into a LangChain runnable using the helper in helpers/prompts.py:

extract_diffs_tasks_prompt = markdown_to_prompt_template(
    "agent/developer/prompts/create_diff_prompt.md"
)
extract_diff_runnable = extract_diffs_tasks_prompt | ChatAnthropic(...) | StrOutputParser()

LLM Invocation with Context

Inside the creating_diffs_for_task node, the agent invokes the runnable with pre-numbered file content and task context:

diffs_tasks = extract_diff_runnable.invoke({
    "task": current_atomic_task.atomic_task,
    "additional_context": current_atomic_task.additional_context,
    "research": convert_tools_messages_to_ai_and_human(state.atomic_implementation_research),
    "file_path": file_path,
    "file_content": file_content,  # Lines prefixed with "1| ", "2| ", etc.

    "output_format": JsonOutputParser(pydantic_object=Diffs).get_format_instructions()
})

The LLM returns a string containing one or more <code_change_request> blocks that specify exactly what to change and where.

Parsing and Applying Code Diffs

Rather than generating unified diff patches, the agent parses the structured blocks and applies edits through direct line-based splicing in Python.

Extracting Change Blocks

The creating_diffs_for_task function (lines 66-90 in agent/developer/graph.py) uses regular expressions to isolate individual modifications:

blocks = re.findall(
    r"<code_change_request>(.*?)</code_change_request>", 
    diffs_tasks, 
    re.DOTALL
)

for block in blocks:
    match = re.search(
        r"original_code_snippet:\s*(.*?)\s*edit_code_snippet:\s*(.*)",
        block,
        re.DOTALL,
    )
    if match:
        original_code = match.group(1).strip()
        edited_code = match.group(2).strip()

Line Number Resolution

The agent extracts absolute line numbers from the numbered prefixes in the original snippet:

orig_lines = original_code.splitlines()
first_line = int(orig_lines[0].split("|")[0].strip())
last_line = int(orig_lines[-1].split("|")[0].strip())

In-Place File Splicing

Using the extracted indices, the agent performs an in-place replacement by reconstructing the file array:

new_content = (
    file_content.splitlines()[: first_line - 1] +
    edited_code.splitlines() +
    file_content.splitlines()[last_line:]
)

with open(file_path, "w") as f:
    f.write("\n".join(new_content))

This approach bypasses external diff libraries entirely (though diff_match_patch is imported for potential future use) and ensures precise character-level control over insertion points.

Handling New File Creation

When the target path does not exist, the agent switches to a separate generation path using agent/developer/prompts/implement_new_file_prompt.md. The creating_diffs_for_task node detects missing files and invokes a dedicated runnable to synthesize complete file content:

new_file_content = create_new_file_runnable.invoke({...})

with open(file_path, "w") as file:
    file.write(new_file_content)

New files bypass the diff parsing logic entirely, writing the LLM-generated content directly to disk.

State Management with Pydantic Models

The agent maintains workflow state through the SoftwareDeveloperState class defined in agent/developer/state.py. This state tracks:

  • The current file path and pre-numbered content
  • Research messages accumulated during the optional tool-calling loop
  • A list of DiffTask objects stored in the Diffs Pydantic model

The type-safe state enables the LangGraph workflow to remain stateless between executions while preserving context across node transitions from prepare_for_implementation through creating_diffs_for_task.

Summary

  • The Developer agent uses a LangGraph workflow in agent/developer/graph.py to orchestrate file modifications through discrete, testable nodes
  • Diff generation relies on agent/developer/prompts/create_diff_prompt.md to constrain LLM outputs into <code_change_request> blocks with numbered original snippets
  • Line numbers embedded in the original code (e.g., "42| def function():") enable precise range extraction without fuzzy matching algorithms
  • The creating_diffs_for_task function parses these blocks and applies edits via Python list splicing: content[:start-1] + new_lines + content[end:]
  • New files bypass the diff mechanism entirely, using implement_new_file_prompt.md to generate complete content from scratch
  • State persistence is handled by the SoftwareDeveloperState and Diffs Pydantic models in agent/developer/state.py

Frequently Asked Questions

How does the Developer agent ensure precise code replacements without using traditional diff algorithms?

The agent requires the LLM to include exact line numbers (prefixed as "123| ") in the original_code_snippet field. It extracts the first and last line numbers to determine the exact slice indices, then replaces that range directly in the file's line array. This avoids the ambiguity of context-based diff matching by relying on explicit line addressing provided by the LLM.

What format must the LLM use when generating code changes?

The LLM must output one or more <code_change_request> blocks, each containing two labeled sections: original_code_snippet with numbered lines from the current file, and edit_code_snippet with the replacement code without line numbers. This structured format is strictly enforced by the prompt template in agent/developer/prompts/create_diff_prompt.md.

How does the agent handle files that don't exist yet?

If the target file path does not exist, the creating_diffs_for_task node detects this condition and switches to the new-file workflow. It invokes a separate prompt (agent/developer/prompts/implement_new_file_prompt.md) to generate the complete file content, then writes it directly using standard file I/O without parsing diff blocks or extracting line numbers.

Where is the workflow state stored during diff generation?

State is managed through the SoftwareDeveloperState Pydantic model in agent/developer/state.py. This includes the current atomic task, numbered file content, research history, and parsed Diffs objects. LangGraph handles the state transitions between nodes like prepare_for_implementation and creating_diffs_for_task, enabling the workflow to resume from any point.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →