How the Developer Agent Generates and Applies Code Diffs in SWE-agent
The Developer agent generates code diffs by prompting an LLM to produce structured <code_change_request> blocks containing numbered original snippets and replacement code, then parses line numbers to splice edits directly into files without external diff libraries.
The langtalks/swe-agent repository implements an autonomous software engineering agent that modifies source code through a specialized pipeline. Unlike traditional tools that rely on unified diff patches, the Developer agent uses a LangGraph workflow to generate and apply file changes through explicit line-based splicing.
LangGraph Workflow Architecture
The diff generation pipeline is orchestrated through a stateful LangGraph workflow defined in agent/developer/graph.py. The compiled graph swe_developer chains specialized nodes that transform high-level tasks into concrete file modifications.
Core Workflow Nodes
The workflow progresses through distinct phases connected by conditional edges:
start_implementing– Initializes the graph and selects the first atomic task from the task queueprepare_for_implementation– Loads the target file content (or marks it as new) and resets prior research contextget_clear_implementation_plan_runnable– Generates a detailed implementation plan using LLM reasoningcreating_diffs_for_task– Extracts structured diff specifications from the LLM or creates new file contentproceed_to_next_atomic_task– Advances to the next task or terminates the workflow
The workflow is compiled into the runnable swe_developer with configuration tags for versioning and tracing.
Generating Structured Diff Specifications
The agent produces machine-readable edit instructions through a carefully engineered prompt template that constrains LLM outputs to a specific XML-like format.
The Prompt Template
Located at agent/developer/prompts/create_diff_prompt.md, the prompt instructs the LLM to wrap each modification in <code_change_request> blocks containing:
original_code_snippet– The exact lines from the current file prefixed with line numbers (e.g.,"123| def foo():")edit_code_snippet– The replacement code without line numbers
This template is loaded into a LangChain runnable using the helper in helpers/prompts.py:
extract_diffs_tasks_prompt = markdown_to_prompt_template(
"agent/developer/prompts/create_diff_prompt.md"
)
extract_diff_runnable = extract_diffs_tasks_prompt | ChatAnthropic(...) | StrOutputParser()
LLM Invocation with Context
Inside the creating_diffs_for_task node, the agent invokes the runnable with pre-numbered file content and task context:
diffs_tasks = extract_diff_runnable.invoke({
"task": current_atomic_task.atomic_task,
"additional_context": current_atomic_task.additional_context,
"research": convert_tools_messages_to_ai_and_human(state.atomic_implementation_research),
"file_path": file_path,
"file_content": file_content, # Lines prefixed with "1| ", "2| ", etc.
"output_format": JsonOutputParser(pydantic_object=Diffs).get_format_instructions()
})
The LLM returns a string containing one or more <code_change_request> blocks that specify exactly what to change and where.
Parsing and Applying Code Diffs
Rather than generating unified diff patches, the agent parses the structured blocks and applies edits through direct line-based splicing in Python.
Extracting Change Blocks
The creating_diffs_for_task function (lines 66-90 in agent/developer/graph.py) uses regular expressions to isolate individual modifications:
blocks = re.findall(
r"<code_change_request>(.*?)</code_change_request>",
diffs_tasks,
re.DOTALL
)
for block in blocks:
match = re.search(
r"original_code_snippet:\s*(.*?)\s*edit_code_snippet:\s*(.*)",
block,
re.DOTALL,
)
if match:
original_code = match.group(1).strip()
edited_code = match.group(2).strip()
Line Number Resolution
The agent extracts absolute line numbers from the numbered prefixes in the original snippet:
orig_lines = original_code.splitlines()
first_line = int(orig_lines[0].split("|")[0].strip())
last_line = int(orig_lines[-1].split("|")[0].strip())
In-Place File Splicing
Using the extracted indices, the agent performs an in-place replacement by reconstructing the file array:
new_content = (
file_content.splitlines()[: first_line - 1] +
edited_code.splitlines() +
file_content.splitlines()[last_line:]
)
with open(file_path, "w") as f:
f.write("\n".join(new_content))
This approach bypasses external diff libraries entirely (though diff_match_patch is imported for potential future use) and ensures precise character-level control over insertion points.
Handling New File Creation
When the target path does not exist, the agent switches to a separate generation path using agent/developer/prompts/implement_new_file_prompt.md. The creating_diffs_for_task node detects missing files and invokes a dedicated runnable to synthesize complete file content:
new_file_content = create_new_file_runnable.invoke({...})
with open(file_path, "w") as file:
file.write(new_file_content)
New files bypass the diff parsing logic entirely, writing the LLM-generated content directly to disk.
State Management with Pydantic Models
The agent maintains workflow state through the SoftwareDeveloperState class defined in agent/developer/state.py. This state tracks:
- The current file path and pre-numbered content
- Research messages accumulated during the optional tool-calling loop
- A list of
DiffTaskobjects stored in theDiffsPydantic model
The type-safe state enables the LangGraph workflow to remain stateless between executions while preserving context across node transitions from prepare_for_implementation through creating_diffs_for_task.
Summary
- The Developer agent uses a LangGraph workflow in
agent/developer/graph.pyto orchestrate file modifications through discrete, testable nodes - Diff generation relies on
agent/developer/prompts/create_diff_prompt.mdto constrain LLM outputs into<code_change_request>blocks with numbered original snippets - Line numbers embedded in the original code (e.g.,
"42| def function():") enable precise range extraction without fuzzy matching algorithms - The
creating_diffs_for_taskfunction parses these blocks and applies edits via Python list splicing:content[:start-1] + new_lines + content[end:] - New files bypass the diff mechanism entirely, using
implement_new_file_prompt.mdto generate complete content from scratch - State persistence is handled by the
SoftwareDeveloperStateandDiffsPydantic models inagent/developer/state.py
Frequently Asked Questions
How does the Developer agent ensure precise code replacements without using traditional diff algorithms?
The agent requires the LLM to include exact line numbers (prefixed as "123| ") in the original_code_snippet field. It extracts the first and last line numbers to determine the exact slice indices, then replaces that range directly in the file's line array. This avoids the ambiguity of context-based diff matching by relying on explicit line addressing provided by the LLM.
What format must the LLM use when generating code changes?
The LLM must output one or more <code_change_request> blocks, each containing two labeled sections: original_code_snippet with numbered lines from the current file, and edit_code_snippet with the replacement code without line numbers. This structured format is strictly enforced by the prompt template in agent/developer/prompts/create_diff_prompt.md.
How does the agent handle files that don't exist yet?
If the target file path does not exist, the creating_diffs_for_task node detects this condition and switches to the new-file workflow. It invokes a separate prompt (agent/developer/prompts/implement_new_file_prompt.md) to generate the complete file content, then writes it directly using standard file I/O without parsing diff blocks or extracting line numbers.
Where is the workflow state stored during diff generation?
State is managed through the SoftwareDeveloperState Pydantic model in agent/developer/state.py. This includes the current atomic task, numbered file content, research history, and parsed Diffs objects. LangGraph handles the state transitions between nodes like prepare_for_implementation and creating_diffs_for_task, enabling the workflow to resume from any point.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →