Performance Implications of Running Langflow with Gunicorn versus Uvicorn Workers

Question

Compare Gunicorn vs Uvicorn for Langflow performance. Discover which ASGI server offers optimal parallelism for your CPU-bound workloads and production needs on Linux, macOS, and Windows.

Accepted Answer

Gunicorn provides multi-process parallelism for CPU-bound workloads on Linux and macOS while Uvicorn offers a lighter, single-process model on Windows, trading memory efficiency for raw throughput and production-grade process management.

The langflow-ai/langflow repository implements platform-specific server architectures that directly impact scalability and resource consumption. Understanding the performance implications of running Langflow with Gunicorn versus Uvicorn workers is essential for optimizing deployments, as the framework automatically selects Gunicorn process management on Unix-like systems while defaulting to pure Uvicorn on Windows.

Platform-Specific Architecture

Langflow's entry point in src/backend/base/langflow/__main__.py detects the operating system and selects the appropriate server implementation at runtime.

Linux and macOS: Gunicorn Process Management

On non-Windows platforms, Langflow instantiates a LangflowApplication (a Gunicorn BaseApplication subclass) and launches it in a separate process. This architecture enables true parallelism across multiple CPU cores by forking separate worker processes.

The startup logic at lines 93-100 creates the server:


# Use Gunicorn with LangflowUvicornWorker for non-Windows systems

from langflow.server import LangflowApplication
...
server = LangflowApplication(app, options)
process_manager.webapp_process = Process(target=server.run)
process_manager.webapp_process.start()

Windows: Pure Uvicorn Single-Process

On Windows, the startup path calls uvicorn.run(...) directly, executing the FastAPI application in a single process with a single async event loop. This avoids the lack of fork() support on Windows but limits CPU utilization to a single core.

Performance Comparison: Gunicorn vs Uvicorn in Langflow

The choice between worker implementations affects five critical performance characteristics:

Aspect	Gunicorn + Uvicorn Workers	Pure Uvicorn
Concurrency Model	Multiple processes, each running an async Uvicorn loop; utilizes separate CPU cores for true parallelism	Single process with one event loop; concurrency limited to one CPU core
Scalability	Linear throughput scaling via the `workers` option (see `get_number_of_workers`); optimal for multi-core machines	Requires external orchestration or experimental `--workers` flag to scale beyond one core
Memory Usage	Higher RAM consumption (each worker duplicates the Python interpreter, imported modules, and in-memory caches)	Lower footprint (single interpreter instance shared across all connections)
Startup Latency	Slower cold start due to process forking and multiple application loads	Faster initialization (single process startup)
Production Resilience	Native graceful reloads, worker timeouts, pre-fork hooks, and robust signal handling	Limited to development-mode reloading; production deployments require external process managers

Implementation Details

The LangflowUvicornWorker Class

Defined in src/backend/base/langflow/server.py, the custom worker inherits from uvicorn.workers.UvicornWorker and configures the asyncio event loop while adding production-specific logging integration:

class LangflowUvicornWorker(UvicornWorker):
    CONFIG_KWARGS = {"loop": "asyncio"}
    ...

This implementation adds a custom SIGINT handler to suppress noisy shutdown logs and routes Gunicorn access logs through Loguru via the custom Logger class, ensuring consistent log formatting across the application.

Worker Configuration

The get_number_of_workers function determines the default worker count based on available CPU cores, allowing the Gunicorn deployment to automatically scale with the host machine's hardware capabilities.

How to Start Langflow with Each Worker Type

Gunicorn Mode (Linux/macOS Default)

Run the standard Langflow command to automatically select the Gunicorn path:

python -m langflow

Or invoke Gunicorn directly with the custom worker class and explicit worker count:

gunicorn \
    --worker-class langflow.server.LangflowUvicornWorker \
    --workers $(nproc) \
    --bind 0.0.0.0:7860 \
    --log-level info \
    langflow.app:app

Pure Uvicorn Mode (Windows or Manual Override)

Force single-process mode on any platform by bypassing the automatic detection:

uvicorn langflow.app:app \
    --host 0.0.0.0 \
    --port 7860 \
    --log-level info

Alternatively, set LANGFLOW_FORCE_UVICORN=1 to override the platform check in src/backend/base/langflow/__main__.py before running python -m langflow.

Summary

Gunicorn with Uvicorn workers delivers superior throughput on multi-core servers through process-level parallelism, but consumes significantly more RAM due to interpreter duplication.
Pure Uvicorn provides faster startup and lower memory overhead, making it ideal for development environments, Windows deployments, and single-core constrained infrastructure.
The LangflowUvicornWorker class in src/backend/base/langflow/server.py customizes the standard Uvicorn worker with asyncio loops and integrated Loguru logging for production signal handling.
Platform detection in src/backend/base/langflow/__main__.py automatically selects the optimal architecture, though manual override is possible via environment variables or direct server invocation.

Frequently Asked Questions

Which worker type is best for production deployments of Langflow?

Gunicorn with Uvicorn workers is the recommended choice for production environments running on Linux or macOS. The multi-process architecture provides true parallelism across CPU cores, handles concurrent requests more efficiently under heavy load, and supports graceful worker restarts without dropping active connections.

Why does Langflow use Gunicorn on Linux but not Windows?

The operating system check in src/backend/base/langflow/__main__.py directs Windows installations to pure Uvicorn because Windows lacks support for the fork() system call, which Gunicorn relies on to spawn worker processes efficiently. On Linux and macOS, the fork() mechanism enables robust process management and memory isolation between workers.

How do I configure the number of Gunicorn workers in Langflow?

Pass the --workers flag when starting Langflow, or set the LANGFLOW_WORKERS environment variable. The underlying code calls get_number_of_workers() to determine defaults, typically setting one worker per CPU core. For direct Gunicorn invocation, use --workers $(nproc) to match your machine's core count.

Does using Gunicorn significantly increase memory usage?

Yes. Each Gunicorn worker runs as a separate Python process, duplicating the interpreter state, imported modules, and Langflow's in-memory graph caches. While this isolation improves stability and crash resistance, deployments should plan for memory consumption that scales linearly with the number of configured workers.