# How the uv-cache Crate Manages Package Caching and Deduplication in uv

> Discover how the uv-cache crate achieves efficient package caching and deduplication using content-addressed storage and atomic indexing for speed and reliability.

- Repository: [Astral/uv](https://github.com/astral-sh/uv)
- Tags: internals
- Published: 2026-03-01

---

**The `uv-cache` crate in astral-sh/uv uses content-addressed storage based on SHA-256 hashes to ensure that identical Python packages are stored only once on disk, using hard links to deduplicate across different package names or versions while maintaining a thread-safe, atomic index for concurrent access.**

The `uv-cache` crate is the storage engine behind uv, the high-performance Python package manager developed by Astral. It handles how downloaded wheels, source distributions, and other artifacts persist on the local filesystem. Understanding how the uv-cache crate manages package caching and deduplication reveals why uv can install dependencies repeatedly without re-downloading identical files, even across different virtual environments or project contexts.

## Content-Addressed Storage Architecture

At the heart of the deduplication strategy is **content-addressed storage**. Instead of organizing files by package name or version, the cache uses the cryptographic hash of the file contents as the primary key.

### SHA-256 Hashing and Canonical Paths

When a wheel or source archive is downloaded, the crate computes its SHA-256 hash immediately. The file is then moved to a canonical location under `~/.cache/uv/archives/<hash>.whl` (or `.tar.gz`). This path is derived solely from the content hash, meaning two identical wheels from different PyPI indexes will collide at the same filesystem path, triggering the deduplication logic.

The hash is also persisted in a side-car metadata file alongside the archive, allowing the cache to verify integrity on every read without recomputing the hash.

### The Metadata Index

To map human-readable package identifiers to these opaque hashes, [`crates/uv-cache/src/by_timestamp.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/by_timestamp.rs) maintains a timestamp-ordered index. This lightweight registry records the package name, version, content hash, last-access time, and the canonical path. The index is updated atomically when new entries are inserted, ensuring that concurrent readers always see a consistent state.

## Deduplication Workflow

The deduplication process is orchestrated in [`crates/uv-cache/src/wheel.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/wheel.rs) and the main `Cache` struct in [`crates/uv-cache/src/lib.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/lib.rs). The workflow follows a strict sequence to minimize disk I/O and guarantee atomicity.

### Download and Hash Computation

When the resolver requests a package, the cache first checks the index for an existing entry. If none exists, the download begins:

1. The artifact is streamed to a temporary file in `~/.cache/uv/tmp/`.
2. As bytes arrive, the SHA-256 hash is updated incrementally.
3. Once the download completes, the hash is finalized.

### Cache Hits and Link Creation

With the hash computed, the cache performs a lookup via `Cache::lookup_hash`:

- **Cache Hit**: If `archives/<hash>.whl` already exists, the temporary file is discarded. The cache then creates a **hard link** (or symbolic link on platforms without hard-link support) from the canonical archive to the package-specific reference path (`packages/<name>/<version>/wheel.whl`). This ensures the physical bytes exist only once while the logical tree remains intact.
- **Cache Miss**: If the hash is novel, the temporary file is atomically moved to `archives/<hash>.whl`. The index is updated, and a hard link is created to the reference path.

This linking strategy is the core mechanism that allows uv to deduplicate packages across different virtual environments and projects without copying files.

## Cache Organization and Layout

The physical layout on disk reflects the dual nature of the cache: content-addressed storage versus human-readable references.

- **`~/.cache/uv/archives/`**: Contains the canonical, hashed artifacts. Files here are immutable and shared globally.
- **`~/.cache/uv/packages/`**: Contains reference links organized by package name and version. These are cheap to create and delete.
- **`~/.cache/uv/tmp/`**: Temporary download staging area, cleaned up automatically after successful insertion.

This structure is defined in [`crates/uv-cache/src/archive.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/archive.rs), which handles the generic archive layout, while [`wheel.rs`](https://github.com/astral-sh/uv/blob/main/wheel.rs) specializes it for Python wheels.

## Concurrency and Thread Safety

The cache is designed to be accessed by multiple parallel uv processes (e.g., during concurrent resolves in a monorepo). Safety is guaranteed through file-based locking.

In [`crates/uv-cache/src/lib.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/lib.rs), all mutating operations acquire an exclusive lock using `fs2::FileExt::lock_exclusive`. This ensures that only one process can write to the index or move files into the canonical archive directory at a time. Read operations do not block each other and can proceed concurrently.

The public API (`Cache::add`, `Cache::get`, `Cache::remove_stale`) is implemented to be both `Send` and `Sync`, allowing safe sharing across async tasks and threads.

## Cache Eviction and Maintenance

To prevent unbounded disk growth, [`crates/uv-cache/src/removal.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/removal.rs) implements a least-recently-used (LRU) eviction policy.

The eviction routine, triggered when the cache exceeds a user-defined size limit, walks the index ordered by last-access timestamp (maintained in [`by_timestamp.rs`](https://github.com/astral-sh/uv/blob/main/by_timestamp.rs)). It removes the oldest entries from both the canonical archive directory and the reference tree until the size constraint is satisfied. Because references are just links, removing a reference does not delete the underlying data until the last link is removed and the canonical entry is evicted.

## CLI Interface

The cache functionality is exposed to users through the command-line interface defined in [`crates/uv-cache/src/cli.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/cli.rs). This provides thin wrappers around the library logic for manual cache management:

- **`uv cache dir`**: Prints the path to the active cache directory.
- **`uv cache clean`**: Removes all cached artefacts and references, effectively resetting the cache.
- **`uv cache list`**: Displays the contents of the cache, showing package names, versions, and their corresponding content hashes.

These commands use the same deduplication and eviction logic as the internal resolver, ensuring consistency between automatic and manual cache operations.

## Summary

- The **uv-cache crate** implements a content-addressed storage system where Python packages are identified by SHA-256 hashes rather than names or versions.
- **Deduplication** is achieved by storing canonical files in `archives/<hash>` and creating hard links to reference paths, ensuring identical content exists only once on disk.
- **Concurrency safety** is maintained through file-based exclusive locks (`fs2::FileExt::lock_exclusive`) and a `Send + Sync` API.
- **Eviction** uses an LRU strategy based on last-access timestamps to keep cache size within configured limits.
- The implementation spans key files including [`lib.rs`](https://github.com/astral-sh/uv/blob/main/lib.rs), [`wheel.rs`](https://github.com/astral-sh/uv/blob/main/wheel.rs), [`archive.rs`](https://github.com/astral-sh/uv/blob/main/archive.rs), [`by_timestamp.rs`](https://github.com/astral-sh/uv/blob/main/by_timestamp.rs), [`removal.rs`](https://github.com/astral-sh/uv/blob/main/removal.rs), and [`cli.rs`](https://github.com/astral-sh/uv/blob/main/cli.rs).

## Frequently Asked Questions

### How does uv-cache determine if a package is already cached?

When a download completes, the crate computes the SHA-256 hash of the artifact and queries the index via `Cache::lookup_hash`. If `archives/<hash>.whl` exists, the cache considers it a hit and creates a hard link to the reference path instead of storing a duplicate.

### What happens when the cache runs out of disk space?

The eviction routine in [`crates/uv-cache/src/removal.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/removal.rs) walks the timestamp-ordered index from [`by_timestamp.rs`](https://github.com/astral-sh/uv/blob/main/by_timestamp.rs) and deletes the oldest entries first. It removes both reference links and, when the last reference is gone, the canonical archive file, until the cache size falls below the configured limit.

### Is the uv-cache safe to use with parallel uv processes?

Yes. All write operations acquire an exclusive file lock using `fs2::FileExt::lock_exclusive`, ensuring atomic updates to the index and archive directory. The public API is `Send + Sync`, allowing concurrent access from multiple threads or async tasks without corruption.

### How can I manually inspect or clean the uv cache?

The CLI interface in [`crates/uv-cache/src/cli.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-cache/src/cli.rs) exposes commands like `uv cache dir` to show the cache location, `uv cache list` to view entries with their hashes, and `uv cache clean` to purge all data. These commands invoke the same internal logic used by the resolver, ensuring consistent behavior.