How Pyrefly Handles Import Resolution and Module Finding

Pyrefly resolves imports through a hierarchical search algorithm implemented in pyrefly/lib/module/finder.rs that mirrors Python's import machinery while adding directory caching, stub package handling, and legacy namespace package support.

Import resolution forms the foundation of static type checking, mapping module names to filesystem paths before type analysis begins. In the facebook/pyrefly repository, this critical functionality is centralized in the pyrefly::module::finder module, which reimplements Python's import semantics with optimizations for performance and stub compatibility. Understanding Pyrefly's import resolution reveals how the tool navigates complex package layouts, from namespace packages to third-party type stubs.

The Import Resolution Pipeline

Entry Point: find_import_internal

The public API for import resolution is find_import_internal (lines 1219‑1234 in pyrefly/lib/module/finder.rs). This function accepts a ConfigFile containing search paths, the target ModuleName to resolve, an optional origin path indicating which file performs the import, a style_filter for preferring .pyi over .py files, and a DirEntryCache for filesystem operations. It returns a FindingOrError<ModulePath> (lines 108‑124) that either contains the resolved filesystem path or a detailed error indicating why resolution failed.

Hierarchical Search Order

Pyrefly queries module sources in a strict sequence defined by the configuration:

  • Build-system-provided search paths
  • Source-database cache (if configured)
  • User-specified search_path directories
  • Custom typeshed path (optional)
  • Bundled standard-library typeshed
  • Heuristic fallback search paths
  • Site-package paths including third-party stubs
  • Extra file extension handlers
  • Implicit namespace package fallbacks

This ordering ensures that user configuration overrides system defaults while maintaining compatibility with Python's module search semantics.

Component Resolution Chain

For each search location, the helper find_module (lines 786‑863) decomposes the target ModuleName into its first component and remaining submodules. It checks for a -stubs variant of the initial component (e.g., foo-stubs for package foo), then delegates to find_module_components (lines 692‑755). The final filesystem checks occur in find_one_part_in_root (lines 334‑425), which distinguishes between regular packages (directories with __init__.py), single-file modules (.py or .pyi), and compiled extensions (.pyc, .pyd).

Module Discovery and Filesystem Interaction

Directory Caching with DirEntryCache

All filesystem operations flow through DirEntryCache (lines 441‑518), which stores directory listings in a HashMap<PathBuf, Option<Arc<SmallMap<OsString, bool>>>>. This eliminates redundant stat and readdir calls, reducing existence checks to O(1) lookups. The cache proves essential for large repositories or remote filesystems like EdenFS, where repeated directory traversals would create significant I/O overhead.

Handling Namespace Packages

When multiple search roots contain directories with the same package name, NamespaceAccumulator (lines 403‑447) merges these paths to emulate Python's __path__ extension semantics. Pyrefly distinguishes between two namespace variants:

  • Legacy namespace packages: Detected by is_pkgutil_namespace (lines 96‑122), which reads the first 4KiB of __init__.py and runs a regex to identify pkgutil.extend_path usage. The first discovered __init__.py wins, but all directories contribute to the package path.
  • Implicit namespace packages: Directories without an __init__.py file (PEP 420), where all matching directories across roots are accumulated into a single logical package.

Stub Package Resolution

The resolver prioritizes type information through resolve_third_party_stub (lines 657‑692). It first searches for packages with a -stubs suffix (e.g., requests-stubs), then falls back to third-party typeshed stubs via typeshed_third_party.rs, and finally checks the bundled standard-library typeshed. The FindResult::best_result method (lines 889‑906) enforces precedence rules: regular packages outrank legacy namespaces, which outrank single-file modules, which outrank compiled extensions.

Extra Extension Handling

For non-standard filename patterns where dots act as module separators (e.g., a.b.c.cinc), find_extra_extension_module (lines 1006‑1060) provides specialized handling that maps these dotted filenames back to module path components.

Resolving Imports in Practice

The following Rust example demonstrates how to programmatically resolve a module import using Pyrefly's public API:

use pyrefly::config::config::ConfigFile;
use pyrefly::module::finder::find_import_internal;
use pyrefly_python::module_name::ModuleName;
use pyrefly::state::state::TransactionTimingCounters;

// Load a configuration (e.g., from pyproject.toml)
let config = ConfigFile::load_default().expect("load config");

// The module we want to resolve, e.g., `numpy.linalg`
let target = ModuleName::from_str("numpy.linalg");

// Optional: the file that contains the import statement
let origin = None;

// We prefer interface (`.pyi`) files but fall back to implementation (`.py`)
let style_filter = None;

// Collect phantom-path diagnostics (optional)
let mut phantom_paths = None;

// Directory cache to speed up repeated lookups
let dir_cache = pyrefly::module::finder::DirEntryCache::new(true);

// Timing counters for profiling (optional)
let timing = None;

// Resolve the import
let result = find_import_internal(
    &config,
    target,
    origin,
    style_filter,
    &mut phantom_paths,
    &dir_cache,
    timing,
);

match result {
    pyrefly::state::loader::FindingOrError::Ok(path) => {
        println!("Found module at {}", path.display());
    }
    pyrefly::state::loader::FindingOrError::Error(err) => {
        eprintln!("Import resolution failed: {:?}", err);
    }
}

This snippet demonstrates the typical usage path: load a ConfigFile, construct a ModuleName, and invoke find_import_internal. The function internally traverses the hierarchical search order and returns either a concrete filesystem path or a FindError indicating issues like Ignored or MissingSource.

Key Source Files and Architecture

File Role
pyrefly/lib/module/finder.rs Core import-resolution engine implementing search logic, caching, and namespace handling
pyrefly/lib/module/typeshed.rs Provides bundled standard-library stub lookup
pyrefly/lib/module/third_party.rs Handles third-party stub discovery via get_bundled_third_party
pyrefly/lib/module/typeshed_third_party.rs Looks up third-party stubs shipped with Pyrefly
pyrefly/lib/module/parse.rs Parses discovered files into ASTs after resolution

Summary

  • Pyrefly's import resolution centers on find_import_internal in finder.rs, implementing Python's import semantics with a configurable hierarchical search path.
  • Performance optimization comes from DirEntryCache (lines 441‑518), which minimizes filesystem operations through in-memory caching of directory listings.
  • Namespace package support uses NamespaceAccumulator (lines 403‑447) to handle both legacy pkgutil.extend_path packages and PEP 420 implicit namespace packages.
  • Stub prioritization follows a strict order: -stubs packages, third-party typeshed, bundled typeshed, then implementation files, managed by resolve_third_party_stub.
  • Precedence rules in FindResult::best_result (lines 889‑906) ensure regular packages outrank legacy namespaces, which outrank single-file modules and compiled extensions.

Frequently Asked Questions

How does Pyrefly handle namespace packages differently from standard Python?

Pyrefly's NamespaceAccumulator (lines 403‑447) explicitly tracks multiple directory roots containing the same package name, building either a LegacyNamespacePackage (when is_pkgutil_namespace detects pkgutil.extend_path in the first 4KiB of __init__.py) or an ImplicitNamespacePackage (no __init__.py present). This matches Python's __path__ extension semantics while maintaining compatibility with both legacy and PEP 420 namespace packages.

What is the performance impact of Pyrefly's directory caching?

The DirEntryCache (lines 441‑518) stores directory entries in a HashMap<PathBuf, Option<Arc<SmallMap<OsString, bool>>>>, reducing stat and readdir operations to O(1) lookups. This optimization is critical for large repositories or remote filesystems like EdenFS, where repeated directory traversals would create significant overhead during import resolution.

How does Pyrefly prioritize between .py and .pyi stub files?

The style_filter parameter in find_import_internal allows explicit preference for interface (.pyi) or implementation (.py) files. When resolving third-party modules, the system first checks for packages with a -stubs suffix, then consults bundled typeshed definitions via resolve_third_party_stub (lines 657‑692) before falling back to standard module discovery.

Where does Pyrefly search for modules during resolution?

Pyrefly follows a strict hierarchical order defined in find_import_internal (lines 1219‑1234): build-system paths, source-database cache, user search_path, custom typeshed, bundled standard-library typeshed, heuristic paths, site-packages (including third-party stubs), extra extensions, and implicit namespace fallbacks. This sequence ensures compatibility with Python's module search semantics while allowing configuration overrides.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →