How Pyrefly Handles Jaxtyping Tensor Shape Annotations: A Deep Dive into the Type System
Pyrefly parses Jaxtyping tensor shape annotations by recognizing wrapper classes like Float or Shaped, tokenizing the shape string into dimension variables or literals, and converting the result into an internal TensorType with TensorSyntax::Jaxtyping syntax tracking.
The facebook/pyrefly type checker provides first-class support for Jaxtyping-style tensor annotations, treating syntax like Float[Tensor, "batch channels"] as a distinct parsing path within its semantic analysis pipeline. This implementation bridges the gap between Python's runtime string-based shape specifications and Pyrefly's static type system through a three-phase recognition, parsing, and integration process.
Recognizing Jaxtyping Wrapper Classes
When Pyrefly encounters a subscript expression like Float[Tensor, "batch channels"], it first determines whether the left-hand identifier represents a Jaxtyping dtype wrapper. This check occurs in AnswersSolver::is_jaxtyping_wrapper within pyrefly/lib/alt/jaxtyping.rs (lines 69-98).
The function validates the class's fully-qualified name against a hard-coded list of recognized wrappers including Float, Int, Shaped, and other Jaxtyping primitives. Once identified, the parser expects exactly two arguments: a tensor base class (e.g., torch.Tensor) and a literal string describing the shape.
Parsing the Shape String
The core parsing logic transforms the Jaxtyping shape string into a structured TensorShape. The helper parse_jaxtyping_dim_tokens processes whitespace-separated tokens and constructs appropriate Type nodes for each dimension.
Anonymous Dimensions and Literals
Pyrefly maps specific token patterns to internal type representations:
- Underscore (
_) represents an anonymous dimension, converted toType::Any(AnyStyle::Implicit)(lines 31-38) - Integer literals (e.g.,
3,256) becomeType::Size(SizeExpr::Literal)nodes (lines 39-44) - Parenthesized arithmetic expressions like
(N + M)are stripped and passed toparse_jaxtyping_arithmetic, which buildsSizeExpr::AddorSizeExpr::Subnodes (lines 45-67)
Named Dimensions and Type Variables
Named identifiers in the shape string (e.g., batch, channels) introduce implicit type variables. The solver invokes AnswersSolver::get_or_create_jaxtyping_dim to obtain or create these variables, wrapping them in Type::Quantified nodes (lines 56-58). These quantified types participate in unification during later type inference phases.
Variadic Shapes and Ellipsis
For variadic tensor shapes, Pyrefly handles two patterns:
- Named variadics (
*nameor*#name) become aQuantifiedtype of kindTypeVarTuple - Ellipsis (
...) is represented as an unbounded tuple viaany_tuple()(lines 86-102)
The token list is examined for variadic markers, and the resulting dimensions are assembled using either TensorShape::from_types for concrete shapes or TensorShape::unpacked for variadic shapes with prefix/middle/suffix splits.
Building the Tensor Type
Once parsing completes, Pyrefly constructs a TensorType (defined in crates/pyrefly_types/src/tensor.rs) combining the original tensor base class and the computed TensorShape. The implementation sets the syntax field to TensorSyntax::Jaxtyping via TensorType::with_syntax (lines 46-55).
The TensorSyntax enum distinguishes between native syntax (Tensor[N, M]) and Jaxtyping syntax, controlling rendering without affecting type identity (lines 30-43). When displaying errors or type information, TensorShape::fmt_jaxtyping generates the canonical string representation (lines 42-52).
# Python source utilizing Jaxtyping annotations
from jaxtyping import Float, Tensor
def linear(x: Float[Tensor, "batch in_features"]) -> Float[Tensor, "batch out_features"]:
pass
# Pyrefly internally represents this as:
# Shaped[torch.Tensor, "batch in_features"] -> Shaped[torch.Tensor, "batch out_features"]
// Rust-side type construction using pyrefly_types
use pyrefly_types::tensor::{TensorType, TensorSyntax, TensorShape};
use pyrefly_types::types::{Type, Quantified, QuantifiedKind};
// Simulating: Float[Tensor, "batch channels"]
let base = /* resolved tensor class */;
let shape = TensorShape::from_types(vec![
Type::Quantified(Box::new(Quantified::new("batch".into(), QuantifiedKind::TypeVar))),
Type::Quantified(Box::new(Quantified::new("channels".into(), QuantifiedKind::TypeVar))),
]);
let jaxtyping_type = TensorType::new(base, shape)
.with_syntax(TensorSyntax::Jaxtyping);
assert_eq!(format!("{}", jaxtyping_type),
r#"Shaped[torch.Tensor, "batch channels"]"#);
Integration with Type Inference
Implicit type variables introduced by shape strings must participate in function-level type inference. Pyrefly collects these variables via AnswersSolver::collect_jaxtyping_tparams (lines 307-327) and adds them to the function's Forall quantifiers. This integration occurs during function definition processing in pyrefly/lib/alt/function.rs (lines 727-730), ensuring shape variables are available for constraint solving.
The system also enforces syntax consistency: a mixed-syntax check prevents native tensor annotations and Jaxtyping annotations from co-existing in the same callable signature, emitting a clear error if detected (lines 341-353).
Summary
- Wrapper Detection:
is_jaxtyping_wrappervalidates dtype wrappers likeFloatorShapedagainst a hard-coded list inpyrefly/lib/alt/jaxtyping.rs - Token Parsing:
parse_jaxtyping_dim_tokensconverts shape strings intoTypenodes supporting literals, anonymous dimensions (_), named variables, arithmetic expressions, and variadic markers - Type Construction:
TensorTypecombines the base tensor class with aTensorShape, tracking Jaxtyping origin viaTensorSyntax::Jaxtyping - Inference Integration:
collect_jaxtyping_tparamsregisters implicit shape variables as function quantifiers, while mixed-syntax guards prevent annotation style conflicts
Frequently Asked Questions
What Jaxtyping wrappers does Pyrefly recognize?
Pyrefly recognizes standard Jaxtyping dtype wrappers including Float, Int, Shaped, and numeric variants. The is_jaxtyping_wrapper function in pyrefly/lib/alt/jaxtyping.rs validates these by checking fully-qualified class names against an internal registry.
How does Pyrefly handle the underscore (_) dimension?
The underscore token maps to Type::Any(AnyStyle::Implicit), representing an anonymous dimension that accepts any size without binding a name. This conversion occurs in parse_jaxtyping_dim_tokens at lines 31-38.
Can Pyrefly mix native tensor syntax with Jaxtyping syntax?
No. Pyrefly enforces syntax uniformity within a single callable. The type checker implements a mixed-syntax validation (lines 341-353) that emits an error if native Tensor[N, M] annotations appear alongside Jaxtyping Shaped[Tensor, "N M"] annotations in the same function signature.
How are shape variables tracked during type inference?
Named dimensions in Jaxtyping strings introduce implicit type variables via get_or_create_jaxtyping_dim. The solver collects these into the function's quantifier list through collect_jaxtyping_tparams (lines 307-327), making them available for unification and constraint solving throughout the type-checking pipeline.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →