# Pandas Read JSON File: Efficient Methods for Nested Structures

> Learn the most efficient pandas read json methods for nested structures. Discover how pd.read_json and pd.json_normalize handle complex data for faster analysis in Python.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-16

---

**The most efficient method to pandas read json file containing nested structures is using `pd.read_json()` without normalization arguments, which automatically routes through the `_simple_json_normalize` fast path in [`pandas/io/json/_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_normalize.py), while complex extractions require explicit `pd.json_normalize()` with targeted `record_path` and `meta` parameters.**

When working with the pandas-dev/pandas codebase, ingesting hierarchical JSON data requires navigating internal optimization pipelines. The `read_json` implementation delegates to a specialized `JsonReader` class that detects structural complexity and automatically selects between high-speed single-pass flattening or recursive normalization algorithms.

## How Pandas Read JSON File Detects Nested Structures

Inside [`pandas/io/json/_json.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_json.py), the `read_json()` function inspects incoming arguments to determine the processing route. When you provide no `record_path`, `meta`, or `max_level` parameters, the implementation hits an early-return branch around line 555 that invokes `_simple_json_normalize` rather than the full generic parser. This detection mechanism ensures common cases receive optimized treatment without manual configuration.

The `_simple_json_normalize` helper (lines 558-572 in [`pandas/io/json/_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_normalize.py)) performs a single recursive walk that flattens dictionaries while preserving column order. Unlike the heavyweight generic parser used for full-featured normalization, this path avoids building intermediate Python objects, making it ideal for the standard pattern `pd.read_json(path, orient='records')`.

## The Automatic Fast Path

For most nested JSON objects where you simply need a flattened table, **let pandas handle the optimization automatically**. When `read_json` detects the "basic case"—no explicit normalization arguments supplied—it routes data through `_simple_json_normalize` instead of the more complex `json_normalize` implementation.

This fast path leverages `nested_to_record` and maintains key order using `_normalize_json_ordered`, completing the transformation in a single pass. The underlying parsing utilizes the vendored `ujson` C-engine located in [`pandas/_libs/src/vendored/ujson/python/ujson.c`](https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/src/vendored/ujson/python/ujson.c) for maximum performance.

## When to Use Explicit json_normalize

If your data requires extracting specific sub-arrays (like an `"items"` list inside each record) while preserving parent metadata fields, call **`pd.json_normalize()`** directly. This function, implemented starting at line 300 in [`pandas/io/json/_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_normalize.py), builds upon the same low-level utilities but adds support for:

- `record_path`: Target specific nested lists for extraction
- `meta`: Include parent fields as columns in the final frame
- `record_prefix` and `meta_prefix`: Prevent column name collisions
- `max_level`: Limit flattening recursion depth

The implementation uses `_pull_field` and `_pull_records` to fetch nested data, then flattens each record with `nested_to_record` (lines 70-78 in [`_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/_normalize.py)).

## Streaming Large Files

For datasets exceeding available memory, use the **chunking interface** provided by `JsonReader`. In [`pandas/io/json/_json.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_json.py), the `JsonReader` class (lines 990-1014) implements an iterator protocol that yields DataFrame chunks without loading the entire file into memory.

This approach works with both the `ujson` engine and the optional `pyarrow` backend, processing line-delimited JSON files sequentially to maintain a minimal footprint.

## Code Examples

### Flat Line-Delimited JSON (Fastest Path)

```python
import pandas as pd

# File contains one JSON object per line

df = pd.read_json("data/line_delimited.json", lines=True, orient="records")
print(df.head())

```

Behind the scenes: `read_json` streams lines through `JsonReader._read_ujson`, building the DataFrame directly via `FrameParser` without normalization overhead.

### Nested JSON with Automatic Flattening

```python
import pandas as pd

# No extra arguments triggers the fast path

df = pd.read_json("data/nested.json")
print(df.head())

```

This execution hits the optimization at line 555 of [`_json.py`](https://github.com/pandas-dev/pandas/blob/main/_json.py), calling `_simple_json_normalize` for single-pass recursive flattening.

### Extracting Sub-Lists with Metadata

```python
import pandas as pd

data = [
    {"id": 1, "info": {"author": "Alice"}, "items": [{"sku": "A", "qty": 2},
                                                    {"sku": "B", "qty": 5}]},
    {"id": 2, "info": {"author": "Bob"},   "items": [{"sku": "C", "qty": 1}]}
]

df = pd.json_normalize(
    data,
    record_path="items",
    meta=["id", ["info", "author"]],
    record_prefix="item_",
    meta_prefix="meta_",
)

```

The function uses targeted recursive extraction via `_pull_records`, traversing only the required branches rather than fully flattening the entire hierarchy.

### Streaming with Chunk Processing

```python
import pandas as pd

reader = pd.read_json("big_file.json", lines=True, chunksize=100_000)
for chunk in reader:
    # Process each chunk independently

    print(chunk.shape)

```

The `JsonReader` object yields DataFrame chunks through its `__next__` method, maintaining constant memory usage regardless of file size.

## Summary

- **Default automatic flattening**: Call `pd.read_json(path)` without normalization arguments to trigger the `_simple_json_normalize` fast path in [`pandas/io/json/_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_normalize.py) (lines 558-572).
- **Complex extractions**: Use `pd.json_normalize()` with explicit `record_path` and `meta` parameters for targeted recursive extraction of nested sub-arrays.
- **Memory efficiency**: Process massive files using `chunksize` with `JsonReader` to stream line-delimited JSON without loading the entire dataset.
- **Engine optimization**: The default `ujson` C-engine in [`pandas/_libs/src/vendored/ujson/python/ujson.c`](https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/src/vendored/ujson/python/ujson.c) provides the fastest parsing for all JSON variants.

## Frequently Asked Questions

### What makes `_simple_json_normalize` faster than regular `json_normalize`?

`_simple_json_normalize` (lines 558-572 in [`pandas/io/json/_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_normalize.py)) performs a single recursive walk optimized for dictionary flattening without constructing intermediate metadata dictionaries or handling prefix arguments. The standard `json_normalize` supports complex field extraction and metadata propagation, which requires additional overhead for parameter processing and recursive record pulling.

### When should I use `lines=True` with `pd.read_json`?

Use `lines=True` when your file contains line-delimited JSON (one JSON object per line). This setting allows `JsonReader` to stream the file sequentially using the `ujson` C-engine, significantly reducing memory usage compared to loading the entire JSON array structure at once. This is the most efficient configuration for large datasets according to the [`pandas/io/json/_json.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_json.py) implementation.

### How does pandas handle deeply nested objects during flattening?

Both `_simple_json_normalize` and `json_normalize` use `nested_to_record` (located around lines 70-78 in [`pandas/io/json/_normalize.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_normalize.py)) to recursively traverse nested dictionaries. The function converts nested keys into dot-separated column names (e.g., `info.author`). For `json_normalize`, you can control recursion depth using the `max_level` parameter, while the automatic fast path flattens all levels unconditionally.

### Can I process JSON files larger than system memory?

Yes, by using the `chunksize` parameter in `pd.read_json()`, which returns a `JsonReader` iterator implemented in [`pandas/io/json/_json.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/json/_json.py) (lines 990-1014). This approach yields DataFrame chunks of the specified row count without loading the entire file, allowing you to process terabyte-scale line-delimited JSON files on limited hardware by iterating through chunks in a `for` loop.