How to Read JSON in Pandas: The Complete Guide to `read_json`

Use pandas.read_json() to convert JSON strings, files, or URLs into a DataFrame by specifying the source path and optional parameters like orient and lines to control parsing behavior.

The pandas.read_json function is the primary entry point for ingesting JSON data into pandas DataFrames. Located in the pandas-dev/pandas repository, this versatile loader handles everything from local files to remote URLs and newline-delimited JSON streams. Whether you are working with simple column-oriented objects or complex nested structures, understanding how to read JSON in pandas efficiently is essential for modern data workflows.

Understanding the read_json Implementation

The core implementation of pandas.read_json resides in pandas/io/json/_json.py, specifically around line 440. This function serves as the main entry point that orchestrates the conversion from JSON to DataFrame through a four-stage pipeline:

  1. Argument validation – The function checks for conflicting parameters (for example, incompatible orient values when lines=True is set) and raises ValueError with descriptive messages.
  2. Raw JSON ingestion – Depending on the input type, the function uses json.load for file objects, json.loads for strings, or pandas.io.common.urlopen for remote URLs.
  3. Normalization – The private helper _json_normalize reshapes nested structures according to the selected orient parameter.
  4. DataFrame construction – The parsed data is passed to DataFrame(data, dtype=dtype, ...), followed by post-processing steps like convert_dates and convert_axes.

Because the implementation relies on Python's standard json module, it remains portable across all supported Python versions without requiring external JSON libraries.

Reading JSON from Different Sources

Local Files

The most common use case for pandas.read_json is loading data from a local file path. By default, the function expects a column-oriented JSON structure.

import pandas as pd

df = pd.read_json("data/example.json")
print(df.head())

This approach assumes example.json contains a JSON object where keys are column names and values are arrays of data, which corresponds to the default orient='columns' setting.

URLs

You can pass a URL string directly to read_json. The function delegates network fetching to pandas.io.common.urlopen before parsing the content.

import pandas as pd

url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/json/iris.json"
df = pd.read_json(url)
print(df.head())

This method handles the HTTP request transparently, then processes the returned JSON exactly like a local file.

Newline-Delimited JSON (NDJSON)

For streaming data or log files where each line is a separate JSON record, use the lines=True parameter. This mode reads the file line-by-line, parsing each line as an individual JSON object.

import pandas as pd

df = pd.read_json("data/records.ndjson", lines=True)
print(df.head())

Setting lines=True is essential for NDJSON formats and automatically selects orient='records' behavior for each line.

Controlling JSON Structure with the orient Parameter

The orient parameter dictates how pandas maps JSON objects to DataFrame rows and columns. The implementation in pandas/io/json/_json.py supports six distinct orientations:

  • 'columns' (default): Expects a JSON object where each key is a column name and each value is an array of column values.
  • 'records': Expects an array of JSON objects, where each object represents a row with column names as keys.
  • 'index': Expects a JSON object where keys are index values and values are objects containing column data.
  • 'split': Expects a JSON object with separate columns, index, and data arrays, minimizing JSON size for large datasets.
  • 'values': Expects a nested array of values without column or index labels.
  • 'table': Expects a JSON object following the Table Schema specification with schema and data fields.

Example using orient='split':

import pandas as pd

json_str = """
{
    "columns": ["A", "B", "C"],
    "index": [0, 1],
    "data": [[1, 2, 3], [4, 5, 6]]
}
"""
df = pd.read_json(json_str, orient="split")
print(df)

Advanced read_json Options

Handling Compression

The read_json function accepts a compression parameter that delegates to pandas.io.common for on-the-fly decompression. Supported formats include gzip, bz2, xz, zip, and zst.

import pandas as pd

df = pd.read_json("data/compressed.json.gz", compression="gzip")

This eliminates the need to manually decompress files before loading.

Enforcing Data Types

Use the dtype parameter to specify column types after JSON materialization but before final DataFrame construction. This overrides pandas' default type inference.

import pandas as pd

df = pd.read_json(
    "data/mixed_types.json", 
    dtype={"id": "int64", "value": "float64"}
)
print(df.dtypes)

Date Parsing

By default, convert_dates=True attempts to infer and convert ISO-8601 date strings to datetime64[ns] objects. You can disable this or specify exact columns to parse using the convert_dates parameter.

import pandas as pd

df = pd.read_json("data/dates.json", convert_dates=["timestamp"])

Summary

  • pandas.read_json in pandas/io/json/_json.py is the primary interface for loading JSON data into DataFrames.
  • The function supports multiple sources: local files, URL strings, and file-like objects, with automatic handling via pandas.io.common.
  • Use the orient parameter to specify JSON structure: columns (default), records, index, split, values, or table.
  • Enable lines=True for newline-delimited JSON (NDJSON) streams.
  • Leverage compression for automatic decompression of gzip, bz2, and other formats.
  • Control data types with dtype and date parsing with convert_dates.

Frequently Asked Questions

What is the default orientation for pandas.read_json?

The default orient is 'columns', which expects a JSON object where keys are column names and values are arrays of data. This format minimizes redundancy when columns contain many repeated values and is the standard output format when exporting DataFrames to JSON using to_json().

How do I read a JSON file line by line in pandas?

Set lines=True when calling read_json. This mode treats each line as a separate JSON record, which is the standard format for newline-delimited JSON (NDJSON) and JSON Lines files commonly used in streaming data pipelines and log aggregation systems.

Can read_json handle compressed files?

Yes, the compression parameter accepts values like 'gzip', 'bz2', 'xz', 'zip', and 'zst'. The function delegates decompression to pandas.io.common, allowing you to read compressed JSON files directly without manual extraction.

Where is the read_json function implemented in the pandas source code?

The implementation resides in pandas/io/json/_json.py around line 440. This file contains the core logic for argument validation, JSON parsing via the standard library json module, normalization through _json_normalize, and final DataFrame construction.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s https://instagit.com/install.md

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client