How to Read JSON in Pandas: The Complete Guide to `read_json`
Use pandas.read_json() to convert JSON strings, files, or URLs into a DataFrame by specifying the source path and optional parameters like orient and lines to control parsing behavior.
The pandas.read_json function is the primary entry point for ingesting JSON data into pandas DataFrames. Located in the pandas-dev/pandas repository, this versatile loader handles everything from local files to remote URLs and newline-delimited JSON streams. Whether you are working with simple column-oriented objects or complex nested structures, understanding how to read JSON in pandas efficiently is essential for modern data workflows.
Understanding the read_json Implementation
The core implementation of pandas.read_json resides in pandas/io/json/_json.py, specifically around line 440. This function serves as the main entry point that orchestrates the conversion from JSON to DataFrame through a four-stage pipeline:
- Argument validation – The function checks for conflicting parameters (for example, incompatible
orientvalues whenlines=Trueis set) and raisesValueErrorwith descriptive messages. - Raw JSON ingestion – Depending on the input type, the function uses
json.loadfor file objects,json.loadsfor strings, orpandas.io.common.urlopenfor remote URLs. - Normalization – The private helper
_json_normalizereshapes nested structures according to the selectedorientparameter. - DataFrame construction – The parsed data is passed to
DataFrame(data, dtype=dtype, ...), followed by post-processing steps likeconvert_datesandconvert_axes.
Because the implementation relies on Python's standard json module, it remains portable across all supported Python versions without requiring external JSON libraries.
Reading JSON from Different Sources
Local Files
The most common use case for pandas.read_json is loading data from a local file path. By default, the function expects a column-oriented JSON structure.
import pandas as pd
df = pd.read_json("data/example.json")
print(df.head())
This approach assumes example.json contains a JSON object where keys are column names and values are arrays of data, which corresponds to the default orient='columns' setting.
URLs
You can pass a URL string directly to read_json. The function delegates network fetching to pandas.io.common.urlopen before parsing the content.
import pandas as pd
url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/json/iris.json"
df = pd.read_json(url)
print(df.head())
This method handles the HTTP request transparently, then processes the returned JSON exactly like a local file.
Newline-Delimited JSON (NDJSON)
For streaming data or log files where each line is a separate JSON record, use the lines=True parameter. This mode reads the file line-by-line, parsing each line as an individual JSON object.
import pandas as pd
df = pd.read_json("data/records.ndjson", lines=True)
print(df.head())
Setting lines=True is essential for NDJSON formats and automatically selects orient='records' behavior for each line.
Controlling JSON Structure with the orient Parameter
The orient parameter dictates how pandas maps JSON objects to DataFrame rows and columns. The implementation in pandas/io/json/_json.py supports six distinct orientations:
'columns'(default): Expects a JSON object where each key is a column name and each value is an array of column values.'records': Expects an array of JSON objects, where each object represents a row with column names as keys.'index': Expects a JSON object where keys are index values and values are objects containing column data.'split': Expects a JSON object with separatecolumns,index, anddataarrays, minimizing JSON size for large datasets.'values': Expects a nested array of values without column or index labels.'table': Expects a JSON object following the Table Schema specification withschemaanddatafields.
Example using orient='split':
import pandas as pd
json_str = """
{
"columns": ["A", "B", "C"],
"index": [0, 1],
"data": [[1, 2, 3], [4, 5, 6]]
}
"""
df = pd.read_json(json_str, orient="split")
print(df)
Advanced read_json Options
Handling Compression
The read_json function accepts a compression parameter that delegates to pandas.io.common for on-the-fly decompression. Supported formats include gzip, bz2, xz, zip, and zst.
import pandas as pd
df = pd.read_json("data/compressed.json.gz", compression="gzip")
This eliminates the need to manually decompress files before loading.
Enforcing Data Types
Use the dtype parameter to specify column types after JSON materialization but before final DataFrame construction. This overrides pandas' default type inference.
import pandas as pd
df = pd.read_json(
"data/mixed_types.json",
dtype={"id": "int64", "value": "float64"}
)
print(df.dtypes)
Date Parsing
By default, convert_dates=True attempts to infer and convert ISO-8601 date strings to datetime64[ns] objects. You can disable this or specify exact columns to parse using the convert_dates parameter.
import pandas as pd
df = pd.read_json("data/dates.json", convert_dates=["timestamp"])
Summary
pandas.read_jsoninpandas/io/json/_json.pyis the primary interface for loading JSON data into DataFrames.- The function supports multiple sources: local files, URL strings, and file-like objects, with automatic handling via
pandas.io.common. - Use the
orientparameter to specify JSON structure:columns(default),records,index,split,values, ortable. - Enable
lines=Truefor newline-delimited JSON (NDJSON) streams. - Leverage
compressionfor automatic decompression ofgzip,bz2, and other formats. - Control data types with
dtypeand date parsing withconvert_dates.
Frequently Asked Questions
What is the default orientation for pandas.read_json?
The default orient is 'columns', which expects a JSON object where keys are column names and values are arrays of data. This format minimizes redundancy when columns contain many repeated values and is the standard output format when exporting DataFrames to JSON using to_json().
How do I read a JSON file line by line in pandas?
Set lines=True when calling read_json. This mode treats each line as a separate JSON record, which is the standard format for newline-delimited JSON (NDJSON) and JSON Lines files commonly used in streaming data pipelines and log aggregation systems.
Can read_json handle compressed files?
Yes, the compression parameter accepts values like 'gzip', 'bz2', 'xz', 'zip', and 'zst'. The function delegates decompression to pandas.io.common, allowing you to read compressed JSON files directly without manual extraction.
Where is the read_json function implemented in the pandas source code?
The implementation resides in pandas/io/json/_json.py around line 440. This file contains the core logic for argument validation, JSON parsing via the standard library json module, normalization through _json_normalize, and final DataFrame construction.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s https://instagit.com/install.md