How to Process JSON at the Command Line with jq

TLDR: The jlevy/the-art-of-command-line repository designates jq as the definitive tool for processing JSON at the command line, utilizing its canonicalization features and Unix-style I/O to enable reliable diffing, filtering, and pipeline integration without temporary files.

The jlevy/the-art-of-command-line repository treats JSON as a first-class data format for shell workflows, explicitly recommending jq for all non-interactive JSON processing. According to the guide's "Processing files and data" section in README.md [2†L227-L228], jq embodies the Unix philosophy: it is a small, portable, composable tool that reads from stdin and writes to stdout, allowing seamless chaining with grep, awk, sed, diff, and colordiff.

Canonicalization and Diffing JSON Files

A core strength of jq is its ability to produce canonical JSON—output with deterministic key ordering and normalized whitespace—making it ideal for version control and diffing. In README.md lines 367-370 [3†L367-L370], the guide demonstrates how to compare two JSON files without creating temporary files using process substitution and the --sort-keys flag.

To normalize JSON for reliable comparison:

jq --sort-keys . data.json > canonical.json

This command guarantees reproducible object key order, removing formatting variations that would otherwise cause false positives in diffs. For direct file comparison, combine process substitution with diff:

diff <(jq --sort-keys . file1.json) <(jq --sort-keys . file2.json) | colordiff | less -R

Here, <( ) feeds the canonicalized output directly into diff as file descriptors, while colordiff adds color highlighting and less -R preserves the color codes in the pager.

Essential jq Operations for Data Extraction

jq uses a path-based query language to extract and transform JSON structures. Below are the fundamental patterns endorsed by the repository for daily command-line work.

Pretty-Printing and Formatting

The simplest jq operation formats compact or minified JSON into human-readable indentation:

jq . data.json | less -R

The identity filter . reformats the input with proper indentation. Piping to less -R preserves color codes if jq is invoked with --color-output in your shell alias.

Extracting Specific Fields

To stream a single field from every object in an array, use the iterator operator .[] combined with property access:

jq '.[] | .id' data.json

This outputs each id value on a new line, suitable for piping into xargs or other line-oriented tools.

Filtering Objects by Condition

The select() function filters arrays based on boolean expressions:

jq '.[] | select(.status == "active")' data.json

Only objects where the status field equals "active" are emitted to stdout.

Counting and Aggregation

To aggregate data before output, wrap expressions in array constructors [] and apply operators like length:

jq '[.[] | .id] | length' data.json

This constructs a temporary array of all IDs, then returns the count, effectively giving you the number of items without external tools like wc.

Integrating jq with Shell Pipelines

Because jq outputs valid JSON (or raw text with the -r flag), downstream tools that understand JSON—including Python one-liners, language-specific libraries, or subsequent jq invocations—can immediately consume the results. The repository highlights this composability by showing jq integrated with curl and xargs:

curl -s https://api.github.com/repos/jlevy/the-art-of-command-line/releases | jq -r '.[0].tag_name' | xargs -I{} echo "Latest tag: {}"

This pipeline fetches API data, extracts the latest release tag as a raw string (-r removes JSON quotes), and passes it to xargs for formatted output. This pattern exemplifies how jq serves as the bridge between web APIs and standard Unix text processing tools.

Interactive Alternatives to jq

While jq excels in scripts and non-interactive pipelines, the repository notes that exploratory work benefits from interactive tools. For manual data exploration, the guide references jid and jiq in lines 27-28 [2†L27-L28] of the README. These utilities provide terminal-based interfaces for real-time JSON querying, though jq remains the go-to utility for production scripts and automated workflows.

Summary

  • The jlevy/the-art-of-command-line repository explicitly recommends jq in README.md [2†L227-L228] as the standard command-line JSON processor.
  • Canonicalization via jq --sort-keys . enables deterministic diffing and version control by normalizing key order and whitespace.
  • Process substitution (<( )) allows diff to compare JSON files directly without temporary intermediate files, as shown in README.md [3†L367-L370].
  • jq queries combine iterators (.[]), filters (select()), and aggregators (length) to transform JSON without external dependencies.
  • Output from jq integrates with standard Unix tools like curl, xargs, grep, and colordiff, maintaining the composability of shell pipelines.
  • For interactive exploration, jid and jiq provide alternatives, but jq remains the preferred tool for automated processing.

Frequently Asked Questions

What makes jq the preferred tool for processing JSON at the command line?

jq is a lightweight, portable processor that adheres to the Unix philosophy of reading from stdin and writing to stdout, allowing it to chain with standard shell tools like grep, awk, and diff. According to the jlevy/the-art-of-command-line source code, it is the recommended utility for all non-interactive JSON work because it outputs valid JSON that any downstream consumer can parse.

How do I reliably diff two JSON files that have different formatting or key ordering?

Use jq's --sort-keys flag to canonicalize both files, ensuring object keys appear in alphabetical order and whitespace is normalized. Then use process substitution to feed both outputs directly to diff: diff <(jq --sort-keys . file1.json) <(jq --sort-keys . file2.json). This technique, documented in README.md [3†L367-L370], eliminates false positives caused by formatting differences.

Can jq handle streaming or very large JSON files?

Yes. jq processes input as a stream and can operate on newline-delimited JSON (NDJSON) or large arrays using the iterator operator .[], which emits one object at a time rather than loading the entire structure into memory. This streaming capability makes it suitable for processing logs or API responses that would overwhelm tools that require complete memory allocation.

What are the interactive alternatives to jq for exploring JSON data?

The repository mentions jid (JSON incremental digger) and jiq as interactive alternatives for exploratory work where you need real-time feedback on queries [2†L27-L28]. These tools provide terminal-based interfaces for constructing jq-like queries interactively, though for scripts and automated pipelines, the standard jq utility remains the authoritative tool.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →