How the uv-cache Crate Manages Package Caching and Deduplication in uv
The uv-cache crate in astral-sh/uv uses content-addressed storage based on SHA-256 hashes to ensure that identical Python packages are stored only once on disk, using hard links to deduplicate across different package names or versions while maintaining a thread-safe, atomic index for concurrent access.
The uv-cache crate is the storage engine behind uv, the high-performance Python package manager developed by Astral. It handles how downloaded wheels, source distributions, and other artifacts persist on the local filesystem. Understanding how the uv-cache crate manages package caching and deduplication reveals why uv can install dependencies repeatedly without re-downloading identical files, even across different virtual environments or project contexts.
Content-Addressed Storage Architecture
At the heart of the deduplication strategy is content-addressed storage. Instead of organizing files by package name or version, the cache uses the cryptographic hash of the file contents as the primary key.
SHA-256 Hashing and Canonical Paths
When a wheel or source archive is downloaded, the crate computes its SHA-256 hash immediately. The file is then moved to a canonical location under ~/.cache/uv/archives/<hash>.whl (or .tar.gz). This path is derived solely from the content hash, meaning two identical wheels from different PyPI indexes will collide at the same filesystem path, triggering the deduplication logic.
The hash is also persisted in a side-car metadata file alongside the archive, allowing the cache to verify integrity on every read without recomputing the hash.
The Metadata Index
To map human-readable package identifiers to these opaque hashes, crates/uv-cache/src/by_timestamp.rs maintains a timestamp-ordered index. This lightweight registry records the package name, version, content hash, last-access time, and the canonical path. The index is updated atomically when new entries are inserted, ensuring that concurrent readers always see a consistent state.
Deduplication Workflow
The deduplication process is orchestrated in crates/uv-cache/src/wheel.rs and the main Cache struct in crates/uv-cache/src/lib.rs. The workflow follows a strict sequence to minimize disk I/O and guarantee atomicity.
Download and Hash Computation
When the resolver requests a package, the cache first checks the index for an existing entry. If none exists, the download begins:
- The artifact is streamed to a temporary file in
~/.cache/uv/tmp/. - As bytes arrive, the SHA-256 hash is updated incrementally.
- Once the download completes, the hash is finalized.
Cache Hits and Link Creation
With the hash computed, the cache performs a lookup via Cache::lookup_hash:
- Cache Hit: If
archives/<hash>.whlalready exists, the temporary file is discarded. The cache then creates a hard link (or symbolic link on platforms without hard-link support) from the canonical archive to the package-specific reference path (packages/<name>/<version>/wheel.whl). This ensures the physical bytes exist only once while the logical tree remains intact. - Cache Miss: If the hash is novel, the temporary file is atomically moved to
archives/<hash>.whl. The index is updated, and a hard link is created to the reference path.
This linking strategy is the core mechanism that allows uv to deduplicate packages across different virtual environments and projects without copying files.
Cache Organization and Layout
The physical layout on disk reflects the dual nature of the cache: content-addressed storage versus human-readable references.
~/.cache/uv/archives/: Contains the canonical, hashed artifacts. Files here are immutable and shared globally.~/.cache/uv/packages/: Contains reference links organized by package name and version. These are cheap to create and delete.~/.cache/uv/tmp/: Temporary download staging area, cleaned up automatically after successful insertion.
This structure is defined in crates/uv-cache/src/archive.rs, which handles the generic archive layout, while wheel.rs specializes it for Python wheels.
Concurrency and Thread Safety
The cache is designed to be accessed by multiple parallel uv processes (e.g., during concurrent resolves in a monorepo). Safety is guaranteed through file-based locking.
In crates/uv-cache/src/lib.rs, all mutating operations acquire an exclusive lock using fs2::FileExt::lock_exclusive. This ensures that only one process can write to the index or move files into the canonical archive directory at a time. Read operations do not block each other and can proceed concurrently.
The public API (Cache::add, Cache::get, Cache::remove_stale) is implemented to be both Send and Sync, allowing safe sharing across async tasks and threads.
Cache Eviction and Maintenance
To prevent unbounded disk growth, crates/uv-cache/src/removal.rs implements a least-recently-used (LRU) eviction policy.
The eviction routine, triggered when the cache exceeds a user-defined size limit, walks the index ordered by last-access timestamp (maintained in by_timestamp.rs). It removes the oldest entries from both the canonical archive directory and the reference tree until the size constraint is satisfied. Because references are just links, removing a reference does not delete the underlying data until the last link is removed and the canonical entry is evicted.
CLI Interface
The cache functionality is exposed to users through the command-line interface defined in crates/uv-cache/src/cli.rs. This provides thin wrappers around the library logic for manual cache management:
uv cache dir: Prints the path to the active cache directory.uv cache clean: Removes all cached artefacts and references, effectively resetting the cache.uv cache list: Displays the contents of the cache, showing package names, versions, and their corresponding content hashes.
These commands use the same deduplication and eviction logic as the internal resolver, ensuring consistency between automatic and manual cache operations.
Summary
- The uv-cache crate implements a content-addressed storage system where Python packages are identified by SHA-256 hashes rather than names or versions.
- Deduplication is achieved by storing canonical files in
archives/<hash>and creating hard links to reference paths, ensuring identical content exists only once on disk. - Concurrency safety is maintained through file-based exclusive locks (
fs2::FileExt::lock_exclusive) and aSend + SyncAPI. - Eviction uses an LRU strategy based on last-access timestamps to keep cache size within configured limits.
- The implementation spans key files including
lib.rs,wheel.rs,archive.rs,by_timestamp.rs,removal.rs, andcli.rs.
Frequently Asked Questions
How does uv-cache determine if a package is already cached?
When a download completes, the crate computes the SHA-256 hash of the artifact and queries the index via Cache::lookup_hash. If archives/<hash>.whl exists, the cache considers it a hit and creates a hard link to the reference path instead of storing a duplicate.
What happens when the cache runs out of disk space?
The eviction routine in crates/uv-cache/src/removal.rs walks the timestamp-ordered index from by_timestamp.rs and deletes the oldest entries first. It removes both reference links and, when the last reference is gone, the canonical archive file, until the cache size falls below the configured limit.
Is the uv-cache safe to use with parallel uv processes?
Yes. All write operations acquire an exclusive file lock using fs2::FileExt::lock_exclusive, ensuring atomic updates to the index and archive directory. The public API is Send + Sync, allowing concurrent access from multiple threads or async tasks without corruption.
How can I manually inspect or clean the uv cache?
The CLI interface in crates/uv-cache/src/cli.rs exposes commands like uv cache dir to show the cache location, uv cache list to view entries with their hashes, and uv cache clean to purge all data. These commands invoke the same internal logic used by the resolver, ensuring consistent behavior.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →