How to Plot a Histogram of a Column in pandas: A Complete Guide to Distribution Visualization
Use Series.hist() or DataFrame.hist() to plot histograms of pandas columns, which automatically handle missing values, numeric validation, and Matplotlib integration while providing high-level parameters for bins, grouping, and styling.
To visualize the distribution of data in pandas effectively, you need to understand how the library wraps Matplotlib's histogram functionality with pandas-specific data handling. The pandas-dev/pandas repository provides a robust plotting architecture that automatically manages data preparation, figure creation, and styling when you plot a histogram of a column in pandas.
Understanding the pandas Histogram Architecture
The histogram implementation in pandas follows a clear dispatch pattern that bridges high-level user APIs with low-level Matplotlib rendering.
Entry Point and Dispatch Mechanism
When you call Series.hist(), the execution flows through pandas.plotting._core.plot_series and ultimately reaches hist_series in pandas/plotting/_matplotlib/hist.py (lines 17-31). This function serves as the primary entry point for converting pandas data structures into histogram visualizations.
Data Preparation and Validation
Before rendering, pandas performs critical data cleaning operations. The hist_series function drops missing values using self.dropna().values and validates that the column contains numeric or datetime data. For DataFrames, hist_frame utilizes select_dtypes (lines 38-41 in hist.py) to automatically filter for numeric columns, raising a clear ValueError if no plottable data exists.
How to Plot a Histogram of a Column in pandas: Basic Syntax
The simplest approach uses the hist() method directly on a Series object.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create sample data
rng = np.random.default_rng(0)
df = pd.DataFrame({
"age": rng.integers(18, 80, size=500),
"salary": rng.normal(50000, 15000, size=500)
})
# Basic histogram of a single column
df["age"].hist(bins=20, edgecolor="black")
plt.title("Age Distribution")
plt.xlabel("Age")
plt.show()
This code exercises the hist_series implementation in pandas/plotting/_matplotlib/hist.py, which ultimately calls ax.hist(values, bins=bins, **kwds) on lines 49-53.
Advanced Histogram Customization
pandas provides extensive parameters for controlling binning, grouping, and visual styling without manually manipulating Matplotlib objects.
Customizing Bins and Range
Control the histogram granularity using the bins parameter, which accepts integers, sequences, or strings.
# Integer bins
df["salary"].hist(bins=30)
# Custom bin edges using numpy
custom_bins = np.arange(0, 110000, 10000)
df["salary"].hist(bins=custom_bins, xlabelsize=12, xrot=45)
plt.title("Salary Distribution with Custom 10K Bins")
plt.show()
Grouped Histograms with the by Parameter
The by parameter creates faceted histograms for different groups, leveraging the hist_frame logic in pandas/plotting/_matplotlib/hist.py.
# Add categorical data
df["department"] = rng.choice(["HR", "Engineering", "Marketing"], size=500)
# Create separate histograms per department
df["salary"].hist(by=df["department"], bins=25, figsize=(12, 4), legend=True)
plt.suptitle("Salary Distribution by Department")
plt.show()
Styling and Layout Options
pandas automatically handles figure creation via plt.gcf() and axis selection through fig.gca() when no ax parameter is provided. Additional styling parameters include:
grid: Boolean to show/hide grid lines (defaultTrue)figsize: Tuple specifying figure dimensions in inchesxlabelsize,ylabelsize: Font sizes for axis labelsxrot,yrot: Rotation angles for tick labels
These options are applied through helper utilities like set_ticks_props in pandas/plotting/_matplotlib/misc.py and maybe_adjust_figure in pandas/plotting/_matplotlib/tools.py.
Under the Hood: The Matplotlib Integration
The actual rendering occurs in pandas/plotting/_matplotlib/hist.py, where the hist_series function executes ax.hist(values, bins=bins, **kwds) on lines 49-53. This design abstracts the low-level Matplotlib API while preserving access to all Matplotlib histogram parameters through the **kwds argument.
The architecture ensures that:
- Missing values are automatically excluded via
dropna() - Only numeric columns are processed (datetime columns are handled separately)
- Figure and axis management follows pandas conventions
- Styling defaults align with pandas visualization standards
Summary
- Use
Series.hist()to plot a histogram of a column in pandas, which routes throughpandas/plotting/_matplotlib/hist.pyfor rendering. - Leverage automatic data handling: missing values are dropped, non-numeric columns are filtered, and axes are created automatically.
- Control visualization with parameters like
bins,by,figsize, andgridwithout manual Matplotlib configuration. - Access the underlying Matplotlib call via
ax.hist()on lines 49-53 ofpandas/plotting/_matplotlib/hist.pywhen you need advanced customization.
Frequently Asked Questions
How do I change the number of bins in a pandas histogram?
Pass an integer to the bins parameter in Series.hist() or DataFrame.hist(). For example, df['column'].hist(bins=30) creates 30 equal-width bins. You can also pass a sequence (like a NumPy array) to specify custom bin edges.
Can I plot histograms for multiple columns at once?
Yes. Use DataFrame.hist() with the column parameter to specify which columns to plot. This creates separate subplots for each column. The hist_frame function in pandas/plotting/_matplotlib/hist.py handles the layout and automatically filters for numeric columns only.
How do I create grouped histograms by category?
Use the by parameter in Series.hist(). Pass a categorical column or Series to create faceted histograms showing the distribution for each group. For example, df['salary'].hist(by=df['department']) generates separate histograms for each department.
Why does my pandas histogram look different from standard Matplotlib?
pandas wraps Matplotlib with additional preprocessing. It automatically drops missing values, filters non-numeric data, and applies default styling (like grids). The actual rendering uses ax.hist() in pandas/plotting/_matplotlib/hist.py, but the wrapper ensures pandas-specific behaviors like proper handling of Index names and datetime types.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →