how-to-guide

How to Combine Multiple LLMs (Claude, Gemini, DeepSeek) in a Development Workflow

February 28, 2026 cyfyifanchen/one-person-company ↗

Combine Claude 3.7 Sonnet, Gemini 2.5 Pro, and DeepSeek-V3 using an orchestration pattern that splits tasks by model strength, executes calls in parallel via Python's asyncio, and aggregates results into a unified development output.

The cyfyifanchen/one-person-company repository curates the most powerful large-language models for developers and demonstrates how they can be orchestrated together to get richer, more reliable results. By implementing a multi-LLM architecture, you leverage each model's unique capabilities while minimizing individual weaknesses through parallel verification and complementary task distribution.

Understanding the Multi-LLM Architecture

Model Selection: Claude, Gemini, and DeepSeek

According to the source code analysis of the repository's README.md lines 69‑73, each model serves a distinct purpose in the development workflow:

Model	Strength	Typical Use
Claude 3.7 Sonnet (Anthropic)	General-purpose, fast knowledge updates	Primary reasoning, brainstorming, code generation
Gemini 2.5 Pro (Google)	Strong logical reasoning, multimodal support	Structured problem-solving, UI mock-ups, data-visualisation
DeepSeek-V3 (DeepSeek)	Excellent coding ability, high-quality output	Verification, edge-case testing, alternative implementations

This combination allows Claude to excel at high-level architectural reasoning, Gemini to handle multimodal outputs like diagrams, and DeepSeek to focus on concrete code implementation and edge-case verification.

Implementing the LLM Orchestration Pattern

The Five-Stage Workflow

The repository outlines a robust architectural pattern for implementing this multi-LLM strategy:

LLM Orchestrator – A thin driver (e.g., a Python script) that receives a developer request and decides which model(s) to call.
Task-Splitter – The request is broken into subtasks that match each model's strength (e.g., "draft API design" → Claude, "verify type safety" → DeepSeek).
Parallel Execution – The orchestrator fires the calls concurrently via asyncio or threading, reducing latency.
Result Aggregation – A merger component normalises responses, de-duplicates suggestions, and ranks them by confidence.
Feedback Loop – The developer can accept, edit, or reject any suggestion; the orchestrator records the outcome to refine future routing.

Visual Workflow Architecture


┌─────────────────────┐
│  Developer Issue    │   ← "Add pagination to the user-list API"
└─────────┬───────────┘
          │
          ▼
 ┌─────────────────────┐
 │  LLM Orchestrator   │  (decides which models to call)
 └─────┬─────┬─────┬───┘
       │     │     │
       ▼     ▼     ▼
 ┌─────┐ ┌─────┐ ┌─────┐
 │Claude│ │Gemini│ │DeepSeek│
 └─────┘ └─────┘ └─────┘
   │        │          │
   ▼        ▼          ▼
(Design) (Diagram) (Code)
   │        │          │
   └─────► Merge & Rank ◄─────┘
          │
          ▼
 ┌─────────────────────┐
 │  Consolidated Output│  ← Unified markdown with design notes,
 └─────────────────────┘     diagrams, and ready-to-run code snippets

Building the LLM Orchestrator in Python

The following framework-agnostic implementation uses httpx for asynchronous HTTP requests. Configure the ENDPOINTS and HEADERS dictionaries with your actual API URLs and authentication tokens.

import asyncio
import httpx
from typing import Dict, List

# ──────────────────────────────────────

# 1️⃣  LLM endpoint configuration

# ──────────────────────────────────────

ENDPOINTS = {
    "claude": "https://api.anthropic.com/v1/messages",
    "gemini": "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent",
    "deepseek": "https://api.deepseek.com/v1/chat/completions",
}
HEADERS = {
    "claude": {"x-api-key": "YOUR_CLAUDE_KEY"},
    "gemini": {"x-goog-api-key": "YOUR_GEMINI_KEY"},
    "deepseek": {"Authorization": "Bearer YOUR_DEEPSEEK_KEY"},
}

# ──────────────────────────────────────

# 2️⃣  Prompt templates tailored to each model

# ──────────────────────────────────────

PROMPTS = {
    "claude": lambda task: {
        "model": "claude-3-5-sonnet-20240620",
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": f"Design a high-level architecture for: {task}"}],
    },
    "gemini": lambda task: {
        "contents": [{"role": "user", "parts": [{"text": f"Create a UML diagram for: {task}"}]}],
    },
    "deepseek": lambda task: {
        "model": "deepseek-coder-v2",
        "messages": [{"role": "user", "content": f"Write production-ready Python code for: {task}"}],
    },
}

# ──────────────────────────────────────

# 3️⃣  Async helper to call a single LLM

# ──────────────────────────────────────

async def call_llm(name: str, task: str) -> Dict:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            ENDPOINTS[name],
            headers=HEADERS[name],
            json=PROMPTS[name](task),
            timeout=30,
        )
        resp.raise_for_status()
        return {name: resp.json()}

# ──────────────────────────────────────

# 4️⃣  Orchestrator – fire calls in parallel

# ──────────────────────────────────────

async def orchestrate(task: str) -> List[Dict]:
    tasks = [
        call_llm("claude", task),
        call_llm("gemini", task),
        call_llm("deepseek", task),
    ]
    results = await asyncio.gather(*tasks, return_exceptions=False)
    return results

# ──────────────────────────────────────

# 5️⃣  Simple merger – pick the best snippet from each model

# ──────────────────────────────────────

def merge_results(raw: List[Dict]) -> str:
    sections = []
    for entry in raw:
        name, payload = next(iter(entry.items()))
        if name == "claude":
            sections.append(f"## Design (Claude)\n{payload['content'][0]['text']}")

        elif name == "gemini":
            # Gemini returns base64-encoded PNG for the diagram – here we just note it

            sections.append("## Diagram (Gemini)\n![Diagram](data:image/png;base64,...)")

        elif name == "deepseek":
            code = payload["choices"][0]["message"]["content"]
            sections.append(f"## Implementation (DeepSeek)\n```python\n{code}\n```")

    return "\n\n".join(sections)

# ──────────────────────────────────────

# 6️⃣  Run example

# ──────────────────────────────────────

if __name__ == "__main__":
    task_description = "Add cursor-based pagination to a FastAPI users endpoint"
    raw_responses = asyncio.run(orchestrate(task_description))
    final_md = merge_results(raw_responses)
    print(final_md)

Step-by-Step Implementation Breakdown

Step	Action	Reason
1️⃣	Define each model's endpoint and authentication	Keeps secrets out of the logic (store them in environment variables)
2️⃣	Create model-specific prompts	Leverages each LLM's unique strength
3️⃣	`call_llm` performs a single HTTP request	Isolates networking concerns
4️⃣	`orchestrate` runs all three calls concurrently	Minimises overall latency via `asyncio.gather`
5️⃣	`merge_results` normalises disparate responses	Gives the developer a coherent Markdown view
6️⃣	Run with a real task	Demonstrates end-to-end usage

You can extend this skeleton with caching (e.g., diskcache or Redis) to avoid repeated calls for identical tasks, result ranking using a tiny meta-model that scores relevance, and error-handling that falls back to a secondary model when one service is down.

Key Repository Files and Resources

The cyfyifanchen/one-person-company repository provides the foundational knowledge for implementing this multi-LLM strategy:

File	Why it matters	Link
`README.md`	The central catalogue of AI tools, including the LLM table and the visual illustration of the LLM section. Provides the authoritative list of recommended models and the high-level rationale for their inclusion.	README.md
`assets/jpg/llm.jpg`	The banner image that visually groups the three LLMs. Helpful when building documentation or UI that references the repo's branding.	assets/jpg/llm.jpg
`assets/gif/banner-cape.gif`	The animated hero banner showing the repo's "one-person-company" theme. Gives context for the repo's purpose (a personal AI toolbox) and can be reused in internal wikis.	assets/gif/banner-cape.gif

These files together explain what models to use, why they were chosen, and how they fit into a broader personal-productivity workflow. By combining them with the orchestration pattern above, developers can get the best of Claude, Gemini, and DeepSeek in a single, streamlined development loop.

Summary

Claude 3.7 Sonnet excels at high-level architectural reasoning and brainstorming, making it ideal for initial design phases in your development workflow.
Gemini 2.5 Pro provides multimodal capabilities for generating diagrams and structured problem-solving outputs that complement text-based code.
DeepSeek-V3 delivers production-ready code generation and rigorous edge-case testing, catching implementation details that other models might miss.
An LLM Orchestrator using Python's asyncio and httpx can parallelize calls to all three models, significantly reducing latency compared to sequential execution.
The Result Aggregation layer normalizes disparate API responses into a unified Markdown document, giving developers a single coherent view of design notes, diagrams, and executable code snippets.

Frequently Asked Questions

What is the primary advantage of combining multiple LLMs instead of using a single model?

Combining multiple LLMs allows you to leverage specific architectural strengths: Claude handles high-level reasoning, Gemini manages multimodal outputs like diagrams, and DeepSeek focuses on code precision. This multi-LLM development workflow reduces blind spots that single models might have while providing redundant verification paths for critical code sections.

How does the orchestrator handle authentication for multiple LLM APIs securely?

The orchestrator stores API keys in environment variables or secure secret managers, referencing them through the HEADERS dictionary configuration. As shown in the Python implementation, keys are never hardcoded in the logic but injected via configuration dictionaries for Claude's x-api-key, Gemini's x-goog-api-key, and DeepSeek's Authorization Bearer tokens.

Can this multi-LLM architecture work with synchronous codebases?

While the example uses asyncio for parallel execution, you can adapt the pattern for synchronous workflows by using threading or sequential calls. However, parallel execution is recommended to minimize latency—running three synchronous API calls sequentially would triple the response time compared to the concurrent asyncio.gather approach demonstrated in the orchestrate() function.

What file in the one-person-company repository contains the recommended LLM combinations?

The authoritative list of recommended models and their specific use cases is documented in the README.md file at lines 69-73, which includes the comparison table of Claude, Gemini, and DeepSeek. The visual representation of this multi-LLM strategy is also available in assets/jpg/llm.jpg.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how cyfyifanchen/one-person-company works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →