# How Pyrefly Handles Module-Level Incrementality and Parallelism for Large Codebases

> Learn how Pyrefly manages module-level incrementality and parallelism for large codebases using a per module state machine and a configurable Rayon based thread pool for efficient type checking.

- Repository: [Meta/pyrefly](https://github.com/facebook/pyrefly)
- Tags: internals
- Published: 2026-05-21

---

**Pyrefly uses a per-module state machine that tracks forward and reverse dependencies to invalidate only changed modules and their transitive dependents, executing all type-checking work on a configurable Rayon-based thread pool that defaults to the number of logical CPUs.**

Facebook's Pyrefly is a high-performance Python type checker designed to scale to millions of lines of code. Its architecture centers on **module-level incrementality and parallelism**, ensuring that editing a single file triggers minimal recomputation while fully utilizing multi-core machines. The implementation relies on explicit dependency graphs and a custom work-stealing thread pool to achieve sub-second feedback in large codebases.

## Per-Module State and Dependency Tracking

### The ModuleDataMut Struct

At the heart of Pyrefly's incrementality is `ModuleDataMut`, defined in [`pyrefly/lib/state/module.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/module.rs). Each module maintains two critical sets:

- **`deps`**: Modules this file imports (forward dependencies)
- **`rdeps`**: Modules that import this file (reverse dependencies)

These fields enable precise invalidation. When a module changes, Pyrefly consults its `rdeps` to determine which other modules might require re-checking. The reverse dependency set is kept under a lock, as noted in the comment at lines 1292-1294 of [`pyrefly/lib/state/state.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/state.rs), ensuring thread-safe access during parallel invalidation.

## Incremental Invalidation and Recomputation

### The invalidate_rdeps Algorithm

When a file is edited, Pyrefly does not perform a full-project re-check. Instead, `State::invalidate_rdeps` (lines 1933-1955 of [`pyrefly/lib/state/state.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/state.rs)) walks the reverse-dependency graph transitively. The algorithm collects all affected modules while carefully cloning the `rdeps` set before iteration to avoid double-counting.

### Transitive Dependencies for LSP

For Language Server Protocol (LSP) features like rename and safe-delete, Pyrefly needs the full transitive closure of reverse dependencies. The `State::get_transitive_rdeps` method (lines 1002-1015) performs a breadth-first walk, deduplicating handles as it goes. This logic powers features in [`pyrefly/lib/lsp/non_wasm/will_rename_files.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/lsp/non_wasm/will_rename_files.rs), limiting edit ranges to only affected files.

## Parallel Execution Architecture

### ThreadPool Configuration

All heavy computation—parsing, binding, and solving—runs inside a `ThreadPool` managed by `pyrefly_util::thread_pool::ThreadPool`. The pool is created once per `State` in `State::new` and stored in the `State::threads` field (line 61 of [`pyrefly/lib/state/state.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/state.rs)).

By default, the pool size equals the number of logical CPUs (capped at 64). You can override this via environment variables before starting Pyrefly:

```bash
export PYREFLY_THREAD_COUNT=8
export PYREFLY_STACK_SIZE=8388608  # 8MB stack size in bytes

```

Internally, `ThreadPool::new` (lines 74-92 of [`pyrefly_util/src/thread_pool.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly_util/src/thread_pool.rs)) constructs a Rayon pool, while `ThreadPool::install` (lines 129-137) schedules closures across worker threads.

### Fine-Grained Task Distribution

Individual phases spawn parallel tasks via `ThreadPool::spawn_many` or `ThreadPool::async_spawn` (lines 103-117 and 119-125). For example, the binding phase iterates over all modules and calls `install` on the pool, letting Rayon distribute the work across available threads.

## The LIFO Work Queue and Eager Scheduling

### The run_step Entry Point

The incremental run loop centers on `State::run_step` (lines 1864-1890 of [`pyrefly/lib/state/state.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/state.rs)). This method drives a single "epoch" of type checking by grabbing the dirty set, resolving imports, and running the solving phase.

### LIFO Queue Benefits

The internal todo queue (`self.data.todo`) uses a **LIFO (last-in-first-out)** strategy via `push_lifo`. After a module reaches the `Solutions` phase, `run_step` immediately pushes its dependents onto the queue, creating a depth-first processing order that keeps hot data in cache. This design dramatically reduces wake-ups for large strongly-connected components, as described in the comment around line 1850.

## LSP Integration and Lazy Evaluation

Pyrefly employs a lazy evaluation strategy by default. Only modules directly requested (e.g., the file open in the IDE) are solved up to `Require::Exports`. Transitively imported modules remain unevaluated until a downstream type error forces their computation. This trade-off, explained in the comment around lines 1512-1545 of [`pyrefly/lib/state/state.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/state.rs), minimizes unnecessary work during initial load.

When performing refactors, the LSP server queries `get_transitive_rdeps` to compute the minimal set of files requiring updates. This prevents the IDE from scanning the entire codebase when renaming a symbol used only in specific downstream modules.

## Summary

- **Per-module dependency tracking**: `ModuleDataMut` stores `deps` and `rdeps` for precise invalidation of only affected modules.
- **Transitive invalidation**: `invalidate_rdeps` walks the reverse dependency graph to collect all modules needing re-checks without full graph scans.
- **Rayon-based parallelism**: A configurable `ThreadPool` defaults to logical CPU count, executing parsing, binding, and solving across all cores.
- **LIFO eager scheduling**: `run_step` uses depth-first queuing to maximize cache locality and reduce coordination overhead.
- **Lazy LSP integration**: Features like rename use `get_transitive_rdeps` to limit computation to genuinely affected files.

## Frequently Asked Questions

### How does Pyrefly determine which modules to recheck after a file change?

Pyrefly uses the `rdeps` (reverse dependencies) set stored in each `ModuleDataMut`. When a file changes, `State::invalidate_rdeps` (lines 1933-1955 of [`pyrefly/lib/state/state.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/state/state.rs)) walks the reverse-dependency graph transitively, collecting all modules that import the changed file either directly or indirectly. Only these modules are added to the dirty set for recomputation.

### What thread pool does Pyrefly use for parallel type checking?

Pyrefly uses a custom `ThreadPool` wrapper around Rayon, defined in [`pyrefly_util/src/thread_pool.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly_util/src/thread_pool.rs). The pool is created in `State::new` and stored in `State::threads`. By default, it spawns one thread per logical CPU (capped at 64), but developers can override this via the `PYREFLY_THREAD_COUNT` environment variable or adjust stack size with `PYREFLY_STACK_SIZE`.

### How does Pyrefly handle deep import chains or circular dependencies?

The engine handles deep chains and cycles through a combination of **LIFO eager scheduling** and **lazy evaluation**. The `run_step` method pushes dependent modules onto a LIFO queue immediately after processing, creating a depth-first traversal that improves cache locality. For cycles, modules are lazily solved only when required, preventing infinite loops while still utilizing parallel workers for independent subgraphs.

### Can I configure Pyrefly's parallelism settings for testing or resource-constrained environments?

Yes. The test harness in [`pyrefly/lib/test/incremental.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly/lib/test/incremental.rs) demonstrates using `TEST_THREAD_COUNT` to create a three-thread pool for deterministic testing. For production use, set the `PYREFLY_THREAD_COUNT` environment variable to limit threads, or `PYREFLY_STACK_SIZE` (in bytes) to prevent stack overflow on deeply nested ASTs. The `ThreadCount` enum in [`pyrefly_util/src/thread_pool.rs`](https://github.com/facebook/pyrefly/blob/main/pyrefly_util/src/thread_pool.rs) supports `Auto`, `Fixed(usize)`, or environment-based configuration.