How to Combine Multiple LLMs (Claude, Gemini, DeepSeek) in a Development Workflow
Combine Claude 3.7 Sonnet, Gemini 2.5 Pro, and DeepSeek-V3 using an orchestration pattern that splits tasks by model strength, executes calls in parallel via Python's asyncio, and aggregates results into a unified development output.
The cyfyifanchen/one-person-company repository curates the most powerful large-language models for developers and demonstrates how they can be orchestrated together to get richer, more reliable results. By implementing a multi-LLM architecture, you leverage each model's unique capabilities while minimizing individual weaknesses through parallel verification and complementary task distribution.
Understanding the Multi-LLM Architecture
Model Selection: Claude, Gemini, and DeepSeek
According to the source code analysis of the repository's README.md lines 69‑73, each model serves a distinct purpose in the development workflow:
| Model | Strength | Typical Use |
|---|---|---|
| Claude 3.7 Sonnet (Anthropic) | General-purpose, fast knowledge updates | Primary reasoning, brainstorming, code generation |
| Gemini 2.5 Pro (Google) | Strong logical reasoning, multimodal support | Structured problem-solving, UI mock-ups, data-visualisation |
| DeepSeek-V3 (DeepSeek) | Excellent coding ability, high-quality output | Verification, edge-case testing, alternative implementations |
This combination allows Claude to excel at high-level architectural reasoning, Gemini to handle multimodal outputs like diagrams, and DeepSeek to focus on concrete code implementation and edge-case verification.
Implementing the LLM Orchestration Pattern
The Five-Stage Workflow
The repository outlines a robust architectural pattern for implementing this multi-LLM strategy:
- LLM Orchestrator – A thin driver (e.g., a Python script) that receives a developer request and decides which model(s) to call.
- Task-Splitter – The request is broken into subtasks that match each model's strength (e.g., "draft API design" → Claude, "verify type safety" → DeepSeek).
- Parallel Execution – The orchestrator fires the calls concurrently via
asyncioor threading, reducing latency. - Result Aggregation – A merger component normalises responses, de-duplicates suggestions, and ranks them by confidence.
- Feedback Loop – The developer can accept, edit, or reject any suggestion; the orchestrator records the outcome to refine future routing.
Visual Workflow Architecture
┌─────────────────────┐
│ Developer Issue │ ← "Add pagination to the user-list API"
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ LLM Orchestrator │ (decides which models to call)
└─────┬─────┬─────┬───┘
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│Claude│ │Gemini│ │DeepSeek│
└─────┘ └─────┘ └─────┘
│ │ │
▼ ▼ ▼
(Design) (Diagram) (Code)
│ │ │
└─────► Merge & Rank ◄─────┘
│
▼
┌─────────────────────┐
│ Consolidated Output│ ← Unified markdown with design notes,
└─────────────────────┘ diagrams, and ready-to-run code snippets
Building the LLM Orchestrator in Python
The following framework-agnostic implementation uses httpx for asynchronous HTTP requests. Configure the ENDPOINTS and HEADERS dictionaries with your actual API URLs and authentication tokens.
import asyncio
import httpx
from typing import Dict, List
# ──────────────────────────────────────
# 1️⃣ LLM endpoint configuration
# ──────────────────────────────────────
ENDPOINTS = {
"claude": "https://api.anthropic.com/v1/messages",
"gemini": "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent",
"deepseek": "https://api.deepseek.com/v1/chat/completions",
}
HEADERS = {
"claude": {"x-api-key": "YOUR_CLAUDE_KEY"},
"gemini": {"x-goog-api-key": "YOUR_GEMINI_KEY"},
"deepseek": {"Authorization": "Bearer YOUR_DEEPSEEK_KEY"},
}
# ──────────────────────────────────────
# 2️⃣ Prompt templates tailored to each model
# ──────────────────────────────────────
PROMPTS = {
"claude": lambda task: {
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 1024,
"messages": [{"role": "user", "content": f"Design a high-level architecture for: {task}"}],
},
"gemini": lambda task: {
"contents": [{"role": "user", "parts": [{"text": f"Create a UML diagram for: {task}"}]}],
},
"deepseek": lambda task: {
"model": "deepseek-coder-v2",
"messages": [{"role": "user", "content": f"Write production-ready Python code for: {task}"}],
},
}
# ──────────────────────────────────────
# 3️⃣ Async helper to call a single LLM
# ──────────────────────────────────────
async def call_llm(name: str, task: str) -> Dict:
async with httpx.AsyncClient() as client:
resp = await client.post(
ENDPOINTS[name],
headers=HEADERS[name],
json=PROMPTS[name](task),
timeout=30,
)
resp.raise_for_status()
return {name: resp.json()}
# ──────────────────────────────────────
# 4️⃣ Orchestrator – fire calls in parallel
# ──────────────────────────────────────
async def orchestrate(task: str) -> List[Dict]:
tasks = [
call_llm("claude", task),
call_llm("gemini", task),
call_llm("deepseek", task),
]
results = await asyncio.gather(*tasks, return_exceptions=False)
return results
# ──────────────────────────────────────
# 5️⃣ Simple merger – pick the best snippet from each model
# ──────────────────────────────────────
def merge_results(raw: List[Dict]) -> str:
sections = []
for entry in raw:
name, payload = next(iter(entry.items()))
if name == "claude":
sections.append(f"## Design (Claude)\n{payload['content'][0]['text']}")
elif name == "gemini":
# Gemini returns base64-encoded PNG for the diagram – here we just note it
sections.append("## Diagram (Gemini)\n")
elif name == "deepseek":
code = payload["choices"][0]["message"]["content"]
sections.append(f"## Implementation (DeepSeek)\n```python\n{code}\n```")
return "\n\n".join(sections)
# ──────────────────────────────────────
# 6️⃣ Run example
# ──────────────────────────────────────
if __name__ == "__main__":
task_description = "Add cursor-based pagination to a FastAPI users endpoint"
raw_responses = asyncio.run(orchestrate(task_description))
final_md = merge_results(raw_responses)
print(final_md)
Step-by-Step Implementation Breakdown
| Step | Action | Reason |
|---|---|---|
| 1️⃣ | Define each model's endpoint and authentication | Keeps secrets out of the logic (store them in environment variables) |
| 2️⃣ | Create model-specific prompts | Leverages each LLM's unique strength |
| 3️⃣ | call_llm performs a single HTTP request |
Isolates networking concerns |
| 4️⃣ | orchestrate runs all three calls concurrently |
Minimises overall latency via asyncio.gather |
| 5️⃣ | merge_results normalises disparate responses |
Gives the developer a coherent Markdown view |
| 6️⃣ | Run with a real task | Demonstrates end-to-end usage |
You can extend this skeleton with caching (e.g., diskcache or Redis) to avoid repeated calls for identical tasks, result ranking using a tiny meta-model that scores relevance, and error-handling that falls back to a secondary model when one service is down.
Key Repository Files and Resources
The cyfyifanchen/one-person-company repository provides the foundational knowledge for implementing this multi-LLM strategy:
| File | Why it matters | Link |
|---|---|---|
README.md |
The central catalogue of AI tools, including the LLM table and the visual illustration of the LLM section. Provides the authoritative list of recommended models and the high-level rationale for their inclusion. | README.md |
assets/jpg/llm.jpg |
The banner image that visually groups the three LLMs. Helpful when building documentation or UI that references the repo's branding. | assets/jpg/llm.jpg |
assets/gif/banner-cape.gif |
The animated hero banner showing the repo's "one-person-company" theme. Gives context for the repo's purpose (a personal AI toolbox) and can be reused in internal wikis. | assets/gif/banner-cape.gif |
These files together explain what models to use, why they were chosen, and how they fit into a broader personal-productivity workflow. By combining them with the orchestration pattern above, developers can get the best of Claude, Gemini, and DeepSeek in a single, streamlined development loop.
Summary
- Claude 3.7 Sonnet excels at high-level architectural reasoning and brainstorming, making it ideal for initial design phases in your development workflow.
- Gemini 2.5 Pro provides multimodal capabilities for generating diagrams and structured problem-solving outputs that complement text-based code.
- DeepSeek-V3 delivers production-ready code generation and rigorous edge-case testing, catching implementation details that other models might miss.
- An LLM Orchestrator using Python's
asyncioandhttpxcan parallelize calls to all three models, significantly reducing latency compared to sequential execution. - The Result Aggregation layer normalizes disparate API responses into a unified Markdown document, giving developers a single coherent view of design notes, diagrams, and executable code snippets.
Frequently Asked Questions
What is the primary advantage of combining multiple LLMs instead of using a single model?
Combining multiple LLMs allows you to leverage specific architectural strengths: Claude handles high-level reasoning, Gemini manages multimodal outputs like diagrams, and DeepSeek focuses on code precision. This multi-LLM development workflow reduces blind spots that single models might have while providing redundant verification paths for critical code sections.
How does the orchestrator handle authentication for multiple LLM APIs securely?
The orchestrator stores API keys in environment variables or secure secret managers, referencing them through the HEADERS dictionary configuration. As shown in the Python implementation, keys are never hardcoded in the logic but injected via configuration dictionaries for Claude's x-api-key, Gemini's x-goog-api-key, and DeepSeek's Authorization Bearer tokens.
Can this multi-LLM architecture work with synchronous codebases?
While the example uses asyncio for parallel execution, you can adapt the pattern for synchronous workflows by using threading or sequential calls. However, parallel execution is recommended to minimize latency—running three synchronous API calls sequentially would triple the response time compared to the concurrent asyncio.gather approach demonstrated in the orchestrate() function.
What file in the one-person-company repository contains the recommended LLM combinations?
The authoritative list of recommended models and their specific use cases is documented in the README.md file at lines 69-73, which includes the comparison table of Claude, Gemini, and DeepSeek. The visual representation of this multi-LLM strategy is also available in assets/jpg/llm.jpg.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →