# Running AutoGen Agents with Distributed Runtime Across Multiple Machines: A Complete Guide

> Learn to run AutoGen agents with distributed runtime across multiple machines. Leverage gRPC for seamless multi-machine deployment enabling scalable AI agent applications.

- Repository: [Microsoft/autogen](https://github.com/microsoft/autogen)
- Tags: how-to-guide
- Published: 2026-03-07

---

**AutoGen's distributed runtime uses a gRPC-based gateway to let you deploy agents across multiple machines while maintaining the same `AgentRuntime` API and message-handler semantics as single-process execution.**

The **microsoft/autogen** repository provides a production-ready distributed agent runtime that enables running AutoGen agents with distributed runtime across multiple machines without rewriting your agent logic. By leveraging a gRPC host-worker architecture, you can scale agent workloads horizontally while preserving the familiar `send_message`, `publish_message`, and `subscribe` patterns used in single-process applications.

## Architecture of the Distributed Agent Runtime

AutoGen's distributed system abstracts networking complexity through a clean separation between a central gRPC host and lightweight worker runtimes.

### Host and Worker Components

The architecture consists of two primary components defined in the `autogen_ext.runtimes.grpc` package:

- **`GrpcWorkerAgentRuntimeHost`** – A lightweight gRPC server implemented in [[`_worker_runtime_host.py`](https://github.com/microsoft/autogen/blob/main/_worker_runtime_host.py)](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host.py) that binds to a network address and brokers messages between workers. It exposes `start()`, `stop()`, and `stop_when_signal()` for lifecycle management.

- **`GrpcWorkerAgentRuntime`** – A client-side runtime implemented in [[`_worker_runtime.py`](https://github.com/microsoft/autogen/blob/main/_worker_runtime.py)](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime.py) that connects to the host, registers local agents, and maintains a read loop processing **request**, **response**, and **cloudEvent** messages.

Both components share the same protobuf contracts defined in [`protos/agent_worker.proto`](https://github.com/microsoft/autogen/blob/main/protos/agent_worker.proto) and inherit from the abstract `AgentRuntime` base class in [[`application/_runtime.py`](https://github.com/microsoft/autogen/blob/main/application/_runtime.py)](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-core/src/autogen_core/application/_runtime.py).

### Message Flow Across the Network

When running AutoGen agents with distributed runtime across multiple machines, messages follow a standardized flow:

1. **Agent to Runtime** – An agent calls `runtime.publish_message()`. The `GrpcWorkerAgentRuntime` serializes the payload and sends an `RpcRequest` to the host.

2. **Host to Servicer** – The `GrpcWorkerAgentRuntimeHostServicer` (implemented in [[`_worker_runtime_host_servicer.py`](https://github.com/microsoft/autogen/blob/main/_worker_runtime_host_servicer.py)](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py)) receives the RPC call and forwards it to the host's internal `MessageRouter`.

3. **Router to Subscribers** – The host dispatches the message to locally registered agents or publishes it to a topic, which broadcasts to all subscribed workers.

4. **Worker to Agent** – Remote workers receive a `Message` of type `cloudEvent`. The worker's read loop processes the event and invokes the corresponding `message_handler` on the target agent.

Because both local and distributed runtimes implement the same `AgentRuntime` interface, you use identical `@message_handler` decorators and agent logic regardless of deployment topology.

## Setting Up the gRPC Host Server

To run AutoGen agents with distributed runtime across multiple machines, start by deploying the gRPC host on a machine with a publicly accessible IP address.

Create a host script that instantiates `GrpcWorkerAgentRuntimeHost` and binds to all interfaces:

```python

# host.py

from autogen_ext.runtimes.grpc import GrpcWorkerAgentRuntimeHost

if __name__ == "__main__":
    # Bind to a public address reachable by workers

    host = GrpcWorkerAgentRuntimeHost(address="0.0.0.0:50051")
    host.start()
    # Keep the process alive until Ctrl-C (or SIGTERM) is received

    import asyncio
    asyncio.run(host.stop_when_signal())

```

The `stop_when_signal()` method ensures graceful shutdown when the process receives `SIGINT` or `SIGTERM`, allowing pending messages to complete before the server closes connections.

## Connecting Worker Runtimes from Remote Machines

Once the host is running, connect worker runtimes from other machines using `GrpcWorkerAgentRuntime`. Workers register agent factories and subscribe to topics exactly as they would with a local runtime.

Here is a minimal worker implementation that registers an echo agent:

```python

# worker.py

import asyncio
from autogen_core import Agent, RoutedAgent, MessageContext, message_handler
from autogen_core.models import AssistantMessage, UserMessage
from autogen_ext.runtimes.grpc import GrpcWorkerAgentRuntime
from autogen_core.application import AgentRuntime

class EchoAgent(RoutedAgent):
    """Echoes back the received user message in upper case."""
    def __init__(self):
        super().__init__("Echo")

    @message_handler
    async def handle_user(self, msg: UserMessage, ctx: MessageContext):
        reply = AssistantMessage(content=msg.content.upper())
        await self.publish_message(reply, ctx.topic_id)   # reply on same topic

async def main():
    # Connect to the host started above

    runtime = GrpcWorkerAgentRuntime(host_address="host.example.com:50051")
    await runtime.start()
    runtime.register_agent_factory("echo", lambda: EchoAgent())
    await runtime.subscribe(agent_type="echo", topic_type="chat")   # subscribe to "chat"

    # Keep the runtime alive

    await runtime.stop_when_signal()

if __name__ == "__main__":
    asyncio.run(main())

```

The `GrpcWorkerAgentRuntime` handles connection retries, maintains a read loop for incoming messages, and serializes payloads using the same registry as the host. You can deploy this worker script on any machine that has network access to the host's port 50051.

## Running the Distributed Group Chat Sample

The microsoft/autogen repository includes a complete demonstration of running AutoGen agents with distributed runtime across multiple machines in the `core_distributed-group-chat` sample. This example orchestrates a writer, editor, manager, and UI agent across separate processes.

Execute the sample by launching each component in a separate terminal:

```bash

# Terminal 1 – start the host

python python/samples/core_distributed-group-chat/run_host.py

# Terminal 2 – start UI (runs in the same process as the host but publishes UI messages)

python python/samples/core_distributed-group-chat/run_ui.py

# Terminal 3 – start writer agent (remote worker)

python python/samples/core_distributed-group-chat/run_writer_agent.py

# Terminal 4 – start editor agent (remote worker)

python python/samples/core_distributed-group-chat/run_editor_agent.py

# Terminal 5 – start the group-chat manager (remote worker)

python python/samples/core_distributed-group-chat/run_group_chat_manager.py

```

Each script imports either `GrpcWorkerAgentRuntimeHost` or `GrpcWorkerAgentRuntime` and uses the standard `message_handler` pattern. The sample source files demonstrate how to structure agent factories, handle topics, and manage distributed state in a real-world scenario.

## Key Implementation Files and Source References

When running AutoGen agents with distributed runtime across multiple machines, these source files define the core behavior:

| File | Purpose |
|------|---------|
| [`python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host.py`](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host.py) | Implements `GrpcWorkerAgentRuntimeHost`, the gRPC server that brokers messages between workers. |
| [`python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py`](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py) | Implements `GrpcWorkerAgentRuntimeHostServicer`, handling incoming RPC calls and forwarding to the host's router. |
| [`python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime.py`](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime.py) | Implements `GrpcWorkerAgentRuntime`, the client-side runtime that connects to the host and manages the read loop. |
| [`python/packages/autogen-core/src/autogen_core/application/_runtime.py`](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-core/src/autogen_core/application/_runtime.py) | Defines the abstract `AgentRuntime` base class shared by local and distributed implementations. |
| `protos/agent_worker.proto` | Protobuf definitions for `Message`, `RpcRequest`, `RpcResponse`, and `CloudEvent` wire formats. |
| [`python/samples/core_distributed-group-chat/_agents.py`](https://github.com/microsoft/autogen/blob/main/python/samples/core_distributed-group-chat/_agents.py) | Agent implementations demonstrating distributed patterns. |
| [`python/samples/core_distributed-group-chat/run_host.py`](https://github.com/microsoft/autogen/blob/main/python/samples/core_distributed-group-chat/run_host.py) | Entry point for launching the gRPC host. |
| `python/samples/core_distributed-group-chat/run_*.py` | Worker launch scripts for individual agents. |

## Summary

Running AutoGen agents with distributed runtime across multiple machines enables horizontal scaling of agent workloads while preserving the single-process programming model. Key takeaways include:

- **GrpcWorkerAgentRuntimeHost** acts as a central message broker using gRPC, implemented in [`_worker_runtime_host.py`](https://github.com/microsoft/autogen/blob/main/_worker_runtime_host.py).
- **GrpcWorkerAgentRuntime** connects remote workers to the host, providing the same `AgentRuntime` API as local execution.
- Agents use identical `@message_handler` decorators regardless of whether they run in-process or across the network.
- The `core_distributed-group-chat` sample demonstrates a complete multi-machine deployment with writer, editor, manager, and UI agents.
- Protobuf contracts in `agent_worker.proto` enable cross-language interoperability between Python and .NET agents.

## Frequently Asked Questions

### Do I need to install extra dependencies for the distributed runtime?

Yes. The distributed runtime requires `grpcio` and `grpcio-tools`. Install them via `pip install "autogen[grpc]"` or manually with `pip install grpcio grpcio-tools`. These packages provide the gRPC client and server implementations that `GrpcWorkerAgentRuntime` and `GrpcWorkerAgentRuntimeHost` depend on.

### Can I mix .NET and Python agents in the same distributed deployment?

Absolutely. The host can be a .NET process using `Microsoft.AutoGen.RuntimeGateway.Grpc` while workers run Python (or vice versa) as long as both use the same protobuf contracts defined in `agent_worker.proto` and `cloudevent.proto`. The serialization registry handles JSON or protobuf payloads transparently across language boundaries.

### How do I choose between JSON and Protobuf payload serialization?

When constructing a `GrpcWorkerAgentRuntime`, pass `payload_serialization_format=` either `JSON_DATA_CONTENT_TYPE` (default) or `PROTOBUF_DATA_CONTENT_TYPE`. JSON is human-readable and easier to debug, while Protobuf offers better performance and smaller payload sizes for large binary data or high-throughput scenarios.

### What happens if a worker runtime crashes or disconnects?

The host's read loop logs the error but continues serving other connected workers. Workers can be restarted independently; upon reconnection, the host will resend any missed subscriptions and the worker re-registers its agents. However, in-flight messages to the crashed worker may be lost unless you implement application-level acknowledgment patterns.