Running AutoGen Agents with Distributed Runtime Across Multiple Machines: A Complete Guide
AutoGen's distributed runtime uses a gRPC-based gateway to let you deploy agents across multiple machines while maintaining the same AgentRuntime API and message-handler semantics as single-process execution.
The microsoft/autogen repository provides a production-ready distributed agent runtime that enables running AutoGen agents with distributed runtime across multiple machines without rewriting your agent logic. By leveraging a gRPC host-worker architecture, you can scale agent workloads horizontally while preserving the familiar send_message, publish_message, and subscribe patterns used in single-process applications.
Architecture of the Distributed Agent Runtime
AutoGen's distributed system abstracts networking complexity through a clean separation between a central gRPC host and lightweight worker runtimes.
Host and Worker Components
The architecture consists of two primary components defined in the autogen_ext.runtimes.grpc package:
-
GrpcWorkerAgentRuntimeHost– A lightweight gRPC server implemented in [_worker_runtime_host.py](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host.py) that binds to a network address and brokers messages between workers. It exposesstart(),stop(), andstop_when_signal()for lifecycle management. -
GrpcWorkerAgentRuntime– A client-side runtime implemented in [_worker_runtime.py](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime.py) that connects to the host, registers local agents, and maintains a read loop processing request, response, and cloudEvent messages.
Both components share the same protobuf contracts defined in protos/agent_worker.proto and inherit from the abstract AgentRuntime base class in [application/_runtime.py](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-core/src/autogen_core/application/_runtime.py).
Message Flow Across the Network
When running AutoGen agents with distributed runtime across multiple machines, messages follow a standardized flow:
-
Agent to Runtime – An agent calls
runtime.publish_message(). TheGrpcWorkerAgentRuntimeserializes the payload and sends anRpcRequestto the host. -
Host to Servicer – The
GrpcWorkerAgentRuntimeHostServicer(implemented in [_worker_runtime_host_servicer.py](https://github.com/microsoft/autogen/blob/main/python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py)) receives the RPC call and forwards it to the host's internalMessageRouter. -
Router to Subscribers – The host dispatches the message to locally registered agents or publishes it to a topic, which broadcasts to all subscribed workers.
-
Worker to Agent – Remote workers receive a
Messageof typecloudEvent. The worker's read loop processes the event and invokes the correspondingmessage_handleron the target agent.
Because both local and distributed runtimes implement the same AgentRuntime interface, you use identical @message_handler decorators and agent logic regardless of deployment topology.
Setting Up the gRPC Host Server
To run AutoGen agents with distributed runtime across multiple machines, start by deploying the gRPC host on a machine with a publicly accessible IP address.
Create a host script that instantiates GrpcWorkerAgentRuntimeHost and binds to all interfaces:
# host.py
from autogen_ext.runtimes.grpc import GrpcWorkerAgentRuntimeHost
if __name__ == "__main__":
# Bind to a public address reachable by workers
host = GrpcWorkerAgentRuntimeHost(address="0.0.0.0:50051")
host.start()
# Keep the process alive until Ctrl-C (or SIGTERM) is received
import asyncio
asyncio.run(host.stop_when_signal())
The stop_when_signal() method ensures graceful shutdown when the process receives SIGINT or SIGTERM, allowing pending messages to complete before the server closes connections.
Connecting Worker Runtimes from Remote Machines
Once the host is running, connect worker runtimes from other machines using GrpcWorkerAgentRuntime. Workers register agent factories and subscribe to topics exactly as they would with a local runtime.
Here is a minimal worker implementation that registers an echo agent:
# worker.py
import asyncio
from autogen_core import Agent, RoutedAgent, MessageContext, message_handler
from autogen_core.models import AssistantMessage, UserMessage
from autogen_ext.runtimes.grpc import GrpcWorkerAgentRuntime
from autogen_core.application import AgentRuntime
class EchoAgent(RoutedAgent):
"""Echoes back the received user message in upper case."""
def __init__(self):
super().__init__("Echo")
@message_handler
async def handle_user(self, msg: UserMessage, ctx: MessageContext):
reply = AssistantMessage(content=msg.content.upper())
await self.publish_message(reply, ctx.topic_id) # reply on same topic
async def main():
# Connect to the host started above
runtime = GrpcWorkerAgentRuntime(host_address="host.example.com:50051")
await runtime.start()
runtime.register_agent_factory("echo", lambda: EchoAgent())
await runtime.subscribe(agent_type="echo", topic_type="chat") # subscribe to "chat"
# Keep the runtime alive
await runtime.stop_when_signal()
if __name__ == "__main__":
asyncio.run(main())
The GrpcWorkerAgentRuntime handles connection retries, maintains a read loop for incoming messages, and serializes payloads using the same registry as the host. You can deploy this worker script on any machine that has network access to the host's port 50051.
Running the Distributed Group Chat Sample
The microsoft/autogen repository includes a complete demonstration of running AutoGen agents with distributed runtime across multiple machines in the core_distributed-group-chat sample. This example orchestrates a writer, editor, manager, and UI agent across separate processes.
Execute the sample by launching each component in a separate terminal:
# Terminal 1 – start the host
python python/samples/core_distributed-group-chat/run_host.py
# Terminal 2 – start UI (runs in the same process as the host but publishes UI messages)
python python/samples/core_distributed-group-chat/run_ui.py
# Terminal 3 – start writer agent (remote worker)
python python/samples/core_distributed-group-chat/run_writer_agent.py
# Terminal 4 – start editor agent (remote worker)
python python/samples/core_distributed-group-chat/run_editor_agent.py
# Terminal 5 – start the group-chat manager (remote worker)
python python/samples/core_distributed-group-chat/run_group_chat_manager.py
Each script imports either GrpcWorkerAgentRuntimeHost or GrpcWorkerAgentRuntime and uses the standard message_handler pattern. The sample source files demonstrate how to structure agent factories, handle topics, and manage distributed state in a real-world scenario.
Key Implementation Files and Source References
When running AutoGen agents with distributed runtime across multiple machines, these source files define the core behavior:
| File | Purpose |
|---|---|
python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host.py |
Implements GrpcWorkerAgentRuntimeHost, the gRPC server that brokers messages between workers. |
python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime_host_servicer.py |
Implements GrpcWorkerAgentRuntimeHostServicer, handling incoming RPC calls and forwarding to the host's router. |
python/packages/autogen-ext/src/autogen_ext/runtimes/grpc/_worker_runtime.py |
Implements GrpcWorkerAgentRuntime, the client-side runtime that connects to the host and manages the read loop. |
python/packages/autogen-core/src/autogen_core/application/_runtime.py |
Defines the abstract AgentRuntime base class shared by local and distributed implementations. |
protos/agent_worker.proto |
Protobuf definitions for Message, RpcRequest, RpcResponse, and CloudEvent wire formats. |
python/samples/core_distributed-group-chat/_agents.py |
Agent implementations demonstrating distributed patterns. |
python/samples/core_distributed-group-chat/run_host.py |
Entry point for launching the gRPC host. |
python/samples/core_distributed-group-chat/run_*.py |
Worker launch scripts for individual agents. |
Summary
Running AutoGen agents with distributed runtime across multiple machines enables horizontal scaling of agent workloads while preserving the single-process programming model. Key takeaways include:
- GrpcWorkerAgentRuntimeHost acts as a central message broker using gRPC, implemented in
_worker_runtime_host.py. - GrpcWorkerAgentRuntime connects remote workers to the host, providing the same
AgentRuntimeAPI as local execution. - Agents use identical
@message_handlerdecorators regardless of whether they run in-process or across the network. - The
core_distributed-group-chatsample demonstrates a complete multi-machine deployment with writer, editor, manager, and UI agents. - Protobuf contracts in
agent_worker.protoenable cross-language interoperability between Python and .NET agents.
Frequently Asked Questions
Do I need to install extra dependencies for the distributed runtime?
Yes. The distributed runtime requires grpcio and grpcio-tools. Install them via pip install "autogen[grpc]" or manually with pip install grpcio grpcio-tools. These packages provide the gRPC client and server implementations that GrpcWorkerAgentRuntime and GrpcWorkerAgentRuntimeHost depend on.
Can I mix .NET and Python agents in the same distributed deployment?
Absolutely. The host can be a .NET process using Microsoft.AutoGen.RuntimeGateway.Grpc while workers run Python (or vice versa) as long as both use the same protobuf contracts defined in agent_worker.proto and cloudevent.proto. The serialization registry handles JSON or protobuf payloads transparently across language boundaries.
How do I choose between JSON and Protobuf payload serialization?
When constructing a GrpcWorkerAgentRuntime, pass payload_serialization_format= either JSON_DATA_CONTENT_TYPE (default) or PROTOBUF_DATA_CONTENT_TYPE. JSON is human-readable and easier to debug, while Protobuf offers better performance and smaller payload sizes for large binary data or high-throughput scenarios.
What happens if a worker runtime crashes or disconnects?
The host's read loop logs the error but continues serving other connected workers. Workers can be restarted independently; upon reconnection, the host will resend any missed subscriptions and the worker re-registers its agents. However, in-flight messages to the crashed worker may be lost unless you implement application-level acknowledgment patterns.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →