internals

How Pyrefly Handles Import Resolution and Module Finding

May 21, 2026 facebook/pyrefly ↗

Pyrefly resolves imports through a hierarchical search algorithm implemented in pyrefly/lib/module/finder.rs that mirrors Python's import machinery while adding directory caching, stub package handling, and legacy namespace package support.

Import resolution forms the foundation of static type checking, mapping module names to filesystem paths before type analysis begins. In the facebook/pyrefly repository, this critical functionality is centralized in the pyrefly::module::finder module, which reimplements Python's import semantics with optimizations for performance and stub compatibility. Understanding Pyrefly's import resolution reveals how the tool navigates complex package layouts, from namespace packages to third-party type stubs.

The Import Resolution Pipeline

Entry Point: `find_import_internal`

The public API for import resolution is find_import_internal (lines 1219‑1234 in pyrefly/lib/module/finder.rs). This function accepts a ConfigFile containing search paths, the target ModuleName to resolve, an optional origin path indicating which file performs the import, a style_filter for preferring .pyi over .py files, and a DirEntryCache for filesystem operations. It returns a FindingOrError<ModulePath> (lines 108‑124) that either contains the resolved filesystem path or a detailed error indicating why resolution failed.

Hierarchical Search Order

Pyrefly queries module sources in a strict sequence defined by the configuration:

Build-system-provided search paths
Source-database cache (if configured)
User-specified search_path directories
Custom typeshed path (optional)
Bundled standard-library typeshed
Heuristic fallback search paths
Site-package paths including third-party stubs
Extra file extension handlers
Implicit namespace package fallbacks

This ordering ensures that user configuration overrides system defaults while maintaining compatibility with Python's module search semantics.

Component Resolution Chain

For each search location, the helper find_module (lines 786‑863) decomposes the target ModuleName into its first component and remaining submodules. It checks for a -stubs variant of the initial component (e.g., foo-stubs for package foo), then delegates to find_module_components (lines 692‑755). The final filesystem checks occur in find_one_part_in_root (lines 334‑425), which distinguishes between regular packages (directories with __init__.py), single-file modules (.py or .pyi), and compiled extensions (.pyc, .pyd).

Module Discovery and Filesystem Interaction

Directory Caching with `DirEntryCache`

All filesystem operations flow through DirEntryCache (lines 441‑518), which stores directory listings in a HashMap<PathBuf, Option<Arc<SmallMap<OsString, bool>>>>. This eliminates redundant stat and readdir calls, reducing existence checks to O(1) lookups. The cache proves essential for large repositories or remote filesystems like EdenFS, where repeated directory traversals would create significant I/O overhead.

Handling Namespace Packages

When multiple search roots contain directories with the same package name, NamespaceAccumulator (lines 403‑447) merges these paths to emulate Python's __path__ extension semantics. Pyrefly distinguishes between two namespace variants:

Legacy namespace packages: Detected by is_pkgutil_namespace (lines 96‑122), which reads the first 4KiB of __init__.py and runs a regex to identify pkgutil.extend_path usage. The first discovered __init__.py wins, but all directories contribute to the package path.
Implicit namespace packages: Directories without an __init__.py file (PEP 420), where all matching directories across roots are accumulated into a single logical package.

Stub Package Resolution

The resolver prioritizes type information through resolve_third_party_stub (lines 657‑692). It first searches for packages with a -stubs suffix (e.g., requests-stubs), then falls back to third-party typeshed stubs via typeshed_third_party.rs, and finally checks the bundled standard-library typeshed. The FindResult::best_result method (lines 889‑906) enforces precedence rules: regular packages outrank legacy namespaces, which outrank single-file modules, which outrank compiled extensions.

Extra Extension Handling

For non-standard filename patterns where dots act as module separators (e.g., a.b.c.cinc), find_extra_extension_module (lines 1006‑1060) provides specialized handling that maps these dotted filenames back to module path components.

Resolving Imports in Practice

The following Rust example demonstrates how to programmatically resolve a module import using Pyrefly's public API:

use pyrefly::config::config::ConfigFile;
use pyrefly::module::finder::find_import_internal;
use pyrefly_python::module_name::ModuleName;
use pyrefly::state::state::TransactionTimingCounters;

// Load a configuration (e.g., from pyproject.toml)
let config = ConfigFile::load_default().expect("load config");

// The module we want to resolve, e.g., `numpy.linalg`
let target = ModuleName::from_str("numpy.linalg");

// Optional: the file that contains the import statement
let origin = None;

// We prefer interface (`.pyi`) files but fall back to implementation (`.py`)
let style_filter = None;

// Collect phantom-path diagnostics (optional)
let mut phantom_paths = None;

// Directory cache to speed up repeated lookups
let dir_cache = pyrefly::module::finder::DirEntryCache::new(true);

// Timing counters for profiling (optional)
let timing = None;

// Resolve the import
let result = find_import_internal(
    &config,
    target,
    origin,
    style_filter,
    &mut phantom_paths,
    &dir_cache,
    timing,
);

match result {
    pyrefly::state::loader::FindingOrError::Ok(path) => {
        println!("Found module at {}", path.display());
    }
    pyrefly::state::loader::FindingOrError::Error(err) => {
        eprintln!("Import resolution failed: {:?}", err);
    }
}

This snippet demonstrates the typical usage path: load a ConfigFile, construct a ModuleName, and invoke find_import_internal. The function internally traverses the hierarchical search order and returns either a concrete filesystem path or a FindError indicating issues like Ignored or MissingSource.

Key Source Files and Architecture

File	Role
`pyrefly/lib/module/finder.rs`	Core import-resolution engine implementing search logic, caching, and namespace handling
`pyrefly/lib/module/typeshed.rs`	Provides bundled standard-library stub lookup
`pyrefly/lib/module/third_party.rs`	Handles third-party stub discovery via `get_bundled_third_party`
`pyrefly/lib/module/typeshed_third_party.rs`	Looks up third-party stubs shipped with Pyrefly
`pyrefly/lib/module/parse.rs`	Parses discovered files into ASTs after resolution

Summary

Pyrefly's import resolution centers on find_import_internal in finder.rs, implementing Python's import semantics with a configurable hierarchical search path.
Performance optimization comes from DirEntryCache (lines 441‑518), which minimizes filesystem operations through in-memory caching of directory listings.
Namespace package support uses NamespaceAccumulator (lines 403‑447) to handle both legacy pkgutil.extend_path packages and PEP 420 implicit namespace packages.
Stub prioritization follows a strict order: -stubs packages, third-party typeshed, bundled typeshed, then implementation files, managed by resolve_third_party_stub.
Precedence rules in FindResult::best_result (lines 889‑906) ensure regular packages outrank legacy namespaces, which outrank single-file modules and compiled extensions.

Frequently Asked Questions

How does Pyrefly handle namespace packages differently from standard Python?

Pyrefly's NamespaceAccumulator (lines 403‑447) explicitly tracks multiple directory roots containing the same package name, building either a LegacyNamespacePackage (when is_pkgutil_namespace detects pkgutil.extend_path in the first 4KiB of __init__.py) or an ImplicitNamespacePackage (no __init__.py present). This matches Python's __path__ extension semantics while maintaining compatibility with both legacy and PEP 420 namespace packages.

What is the performance impact of Pyrefly's directory caching?

The DirEntryCache (lines 441‑518) stores directory entries in a HashMap<PathBuf, Option<Arc<SmallMap<OsString, bool>>>>, reducing stat and readdir operations to O(1) lookups. This optimization is critical for large repositories or remote filesystems like EdenFS, where repeated directory traversals would create significant overhead during import resolution.

How does Pyrefly prioritize between `.py` and `.pyi` stub files?

The style_filter parameter in find_import_internal allows explicit preference for interface (.pyi) or implementation (.py) files. When resolving third-party modules, the system first checks for packages with a -stubs suffix, then consults bundled typeshed definitions via resolve_third_party_stub (lines 657‑692) before falling back to standard module discovery.

Where does Pyrefly search for modules during resolution?

Pyrefly follows a strict hierarchical order defined in find_import_internal (lines 1219‑1234): build-system paths, source-database cache, user search_path, custom typeshed, bundled standard-library typeshed, heuristic paths, site-packages (including third-party stubs), extra extensions, and implicit namespace fallbacks. This sequence ensures compatibility with Python's module search semantics while allowing configuration overrides.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how facebook/pyrefly works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →