How to Use the Spark Show Command to Display DataFrames in Table Format

The spark show command invokes the show() method on a DataFrame (an alias for Dataset[Row]) to render distributed data as a formatted ASCII table, with optional parameters to control row count, string truncation, and vertical layout.

In the apache/spark repository, the spark show command serves as the primary debugging and inspection tool for DataFrame contents. This command is implemented within the Dataset API and converts partitioned data into a readable console format without requiring a full collection to the driver node.

How the Spark Show Command Works Internally

Understanding the internal execution flow helps optimize performance when displaying large datasets.

Entry Points in the Dataset API

The public spark show command surface resides in sql/api/src/main/scala/org/apache/spark/sql/Dataset.scala (lines 58-77). The simplest overload def show(): Unit = show(20) defaults to displaying 20 rows with string truncation enabled. Additional overloads accept numRows, truncate, and vertical parameters to customize output behavior.

String Formatting with showString

Each show method delegates to the private showString implementation located in sql/core/src/main/scala/org/apache/spark/sql/classic/Dataset.scala (lines 68-78). This helper constructs the printable representation by normalizing the requested row count, invoking getRows to collect data as Seq[Seq[String]], and calculating column widths to pad cells with | separators.

Row Extraction and Padding

The getRows method extracts values and converts them to strings. Padding utilities in sql/core/src/main/scala/org/apache/spark/sql/util/Utils.scala handle alignment: standard mode uses right-padding for cells, while vertical mode aligns field names to the right of record separators. When the DataFrame contains more rows than requested, the implementation appends a footer indicating partial results.

Spark Show Command Syntax and Parameters

The spark show command offers four overloads:

  1. show() – Display 20 rows, truncate strings longer than 20 characters.
  2. show(numRows: Int) – Specify row count, keep default truncation.
  3. show(truncate: Boolean) – Toggle truncation (false shows full strings), keep 20 rows.
  4. show(numRows: Int, truncate: Int, vertical: Boolean) – Full control: exact truncation length, row count, and vertical layout.

When truncate is an integer greater than 0, strings exceeding that length are shortened to truncate-3 characters plus an ellipsis (). Setting truncate to false disables truncation entirely.

Practical Examples of the Spark Show Command

Basic Usage

Create a DataFrame and display it with default settings:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("ShowExample")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

val df = Seq(
  (2018, "January", 0.503218, 0.595103),
  (2019, "February", 0.523289, 0.570307),
  (2020, "March", 0.436504, 0.475256)
).toDF("year", "month", "avg_adj_close", "max_adj_close")

// Default: 20 rows, truncate at 20 chars
df.show()

Output:


+----+-------+-------------+-------------+
|year|  month|avg_adj_close|max_adj_close|
+----+-------+-------------+-------------+
|2018|January|      0.50322|      0.5951 |
|2019|February|     0.52329|      0.5703 |
|2020|  March|      0.4365 |      0.4753 |
+----+-------+-------------+-------------+

Controlling Row Count and Truncation

Display 5 rows without truncation:

df.show(numRows = 5, truncate = false)

Or limit strings to exactly 10 characters:

df.show(numRows = 5, truncate = 10)

Vertical Display Mode

For wide DataFrames, use vertical layout to prevent horizontal scrolling:

df.show(numRows = 3, truncate = 20, vertical = true)

Output:


-RECORD 0-------------------
 year            | 2018
 month           | January
 avg_adj_close   | 0.503218
 max_adj_close   | 0.595103
-RECORD 1-------------------
 year            | 2019
 month           | February
 ...

Summary

Frequently Asked Questions

What is the default number of rows displayed by the spark show command?

By default, the spark show command displays 20 rows. This default is hard-coded in the show() method overload located in sql/api/src/main/scala/org/apache/spark/sql/Dataset.scala, which internally calls show(20).

How does the truncate parameter work in the spark show command?

The truncate parameter controls string length in cell output. When set to true (default), strings longer than 20 characters are truncated. When passed as an integer, strings exceeding that length are shortened to truncate-3 characters plus an ellipsis (). Setting truncate to false disables truncation entirely, displaying full strings.

Can I use the spark show command in PySpark?

Yes. Although the underlying implementation is Scala-based in the apache/spark repository, PySpark exposes the same show() method on DataFrame objects. The Python API mirrors the Scala overloads, accepting n (numRows), truncate, and vertical parameters, and delegates to the same internal showString logic.

This footer appears when the DataFrame contains more rows than the numRows parameter specifies. The showString implementation in sql/core/src/main/scala/org/apache/spark/sql/classic/Dataset.scala detects that additional rows exist and appends the message to indicate the output is a partial view.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →