how-to-guide

What Data Can Knowledge-Work-Plugins Process: SQL Tables, Files, and PDFs

May 25, 2026 anthropics/knowledge-work-plugins ↗

Knowledge-work-plugins skills process structured SQL tables from any major data warehouse, local flat files (CSV, Excel, Parquet, JSON), in-memory Pandas DataFrames, analytic narratives, and PDF documents through a unified MCP connector abstraction.

The knowledge-work-plugins repository from Anthropic provides a modular suite of skills that transform natural language prompts into concrete data operations. Each skill targets specific data types and formats—from relational database tables to interactive PDF annotations—enabling seamless data workflows without boilerplate code.

Structured Relational Data and SQL Tables

The skills in data/skills/explore-data/ and data/skills/write-query/ handle structured relational data across dozens of SQL dialects.

Profiling Warehouse Tables

The explore-data skill ingests SQL tables from any dialect supported by the MCP server—including PostgreSQL, Snowflake, BigQuery, Redshift, MySQL, DuckDB, and SQLite—and generates comprehensive data profiles. According to the implementation in data/skills/explore-data/SKILL.md, the skill returns row/column counts, null rates, cardinality metrics, distribution statistics, and quality flags for dimensional, metric, temporal, and identifier columns.

Generating Optimized SQL

The write-query skill, documented in data/skills/write-query/SKILL.md, parses natural language requests against user-described relational schemas. It discovers warehouse schema metadata when available and emits dialect-specific SQL with common table expressions (CTEs), partition pruning hints, and performance notes tailored to the target engine (Snowflake, BigQuery, etc.).

Flat Files and Local Data Formats

Beyond warehouse connections, the explore-data skill processes local file formats including CSV, Excel, Parquet, and JSON. The skill applies the same profiling algorithm used for SQL tables, treating flat files as ephemeral tables to generate cardinality reports and quality assessments without requiring a persistent database connection.

Analytic Narratives and Validation Targets

The validate-data skill, defined in data/skills/validate-data/SKILL.md, processes unstructured and semi-structured analytic artifacts rather than raw data. It accepts:

SQL queries (for methodology review)
Chart screenshots (for visual accuracy checks)
Markdown and Jupyter Notebook files
Natural language analytical narratives

The skill returns structured validation reports assessing calculation correctness, bias, presentation quality, and confidence levels with actionable improvement suggestions.

In-Memory Data Structures for Visualization

The data-visualization skill, implemented in data/skills/data-visualization/SKILL.md, operates on Pandas DataFrames (in-memory), CSV/Parquet files, or query results supplied by upstream skills. It generates ready-to-run Python code using matplotlib, seaborn, or plotly, complete with chart-type recommendations, design principles, and accessibility checklists.

Schema Metadata and Context Extraction

The data-context-extractor skill introspects any data source available through the MCP server to extract structured metadata. As documented in data/skills/data-context-extractor/SKILL.md, it returns table definitions, column comments, relationship mappings, and domain templates that downstream skills use for prompt engineering and automated query generation.

PDF Documents and Interactive Reports

The view-pdf skill in pdf-viewer/skills/view-pdf/SKILL.md handles document-centric workflows. It processes local PDF files and HTTPS-served PDFs (from sources like arXiv or Zenodo) through an interactive viewer API. Users can highlight text, add annotations, fill forms, and place image stamps. Note that text extraction is delegated to the native Read tool; the skill focuses on visual interaction and markup rather than parsing.

The MCP Connector Architecture

All skills remain agnostic to the underlying storage engine by relying on the Multi-Connector Platform (MCP) abstraction. This architecture allows the same skill logic to communicate with PostgreSQL, Snowflake, BigQuery, MySQL, DuckDB, and SQLite without code changes. The skill delegates connector discovery to the repository’s CONNECTORS.md reference, then invokes the appropriate driver (SQL, pandas, or PDF viewer API) based on the detected data type.

Practical Code Examples

Below are the natural language slash commands used to invoke each skill type, with representative outputs.

Profiling a SQL Table

/explore-data sales.orders

Sample response:


## Data Profile: sales.orders

### Overview

- Rows: 2,340,891
- Columns: 23 (8 dimensions, 6 metrics, 4 dates, 5 IDs)
- Date range: 2021‑03‑15 to 2024‑01‑22

### Column Details

| Column       | Type      | Null % | Distinct % | Top Values (freq) |
|--------------|-----------|--------|------------|-------------------|
| order_id     | Identifier| 0.0    | 100.0%     | —                 |
| status       | Dimension | 0.2    | 12.4%      | completed (45 %)  |
| revenue_usd  | Metric    | 0.0    | 99.9%      | —                 |
| created_at   | Temporal  | 0.0    | 99.8%      | —                 |

Generating Dialect-Specific SQL

/write-query Top 10 products by revenue in the last 30 days, using Snowflake syntax

Sample response:

-- Snowflake‑optimized query (uses CTEs, partition pruning)
WITH recent_events AS (
    SELECT *
    FROM sales.events
    WHERE event_date >= DATEADD(day, -30, CURRENT_DATE())
),
product_revs AS (
    SELECT
        product_id,
        SUM(revenue_usd) AS total_revenue
    FROM recent_events
    GROUP BY product_id
)
SELECT
    p.product_name,
    pr.total_revenue
FROM product_revs pr
JOIN dim.products p
    ON pr.product_id = p.product_id
ORDER BY pr.total_revenue DESC
LIMIT 10;

The skill appends performance notes indicating that Snowflake will automatically prune partitions on event_date because the filter appears early in the query.

Validating Analytic Work

/validate-data Review this quarterly revenue analysis before I send it to exec:

Sample response:


## Validation Report

### Overall Assessment: Share with caveats

- **Methodology** – Correctly aggregates by quarter, but the `order_date` filter was applied after the GROUP BY (should be early for performance).
- **Issues Found**
  1. **High null rate** on `discount_code` (23 %) – flag as “optional”.
  2. **Denominator shifting** – Q4 uses a different fiscal calendar; adjust for consistency.
- **Suggested Improvements**
  - Add a “% of total revenue” column.
  - Document the fiscal calendar change in the narrative.

Creating Data Visualizations

/data-visualization Show a line chart of monthly active users (MAU) over the past year

Sample Python snippet:

import matplotlib.pyplot as plt
import pandas as pd

# Assume `df` already contains columns: month (datetime), mau (int)

plt.style.use('seaborn-v0_8-whitegrid')
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(df['month'], df['mau'], marker='o', linewidth=2, color='#4C72B0')
ax.set_title('Monthly Active Users – Last 12 Months', fontweight='bold')
ax.set_xlabel('Month')
ax.set_ylabel('MAU')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('mau_line_chart.png', dpi=150)

Interacting with PDFs

/view-pdf /files/contracts/2024‑service‑agreement.pdf

The skill initiates an interactive workflow:

display_pdf returns a viewUUID
interact → highlight_text marks specific clauses (e.g., "Effective Date")
interact → add_annotations inserts stamps like "CONFIDENTIAL"
interact → get_screenshot confirms visual placement

Summary

knowledge-work-plugins processes relational data via the explore-data and write-query skills, supporting PostgreSQL, Snowflake, BigQuery, MySQL, DuckDB, and SQLite.
Flat files (CSV, Excel, Parquet, JSON) are profiled using the same engine as SQL tables through the MCP connector abstraction.
The validate-data skill targets analytic artifacts—SQL queries, charts, Markdown files, and narratives—for methodology and bias checking.
Visualization workflows accept Pandas DataFrames and generate production-ready Python code (matplotlib/seaborn/plotly) via the data-visualization skill.
PDF documents are handled interactively by the view-pdf skill, allowing annotation and markup without text extraction.
All data operations route through the MCP (Multi-Connector Platform) layer, ensuring storage-engine portability.

Frequently Asked Questions

What file formats can knowledge-work-plugins read from local storage?

The explore-data skill reads CSV, Excel, Parquet, and JSON files from local storage, applying the same profiling logic (null rates, cardinality, distribution stats) used for SQL warehouse tables. These files are treated as ephemeral tables during the analysis phase.

Which SQL dialects does the write-query skill support?

The write-query skill supports all major data warehouse dialects including PostgreSQL, Snowflake, BigQuery, Redshift, MySQL, DuckDB, and SQLite. It automatically detects dialect-specific syntax requirements and generates optimized queries with CTEs and partition pruning hints appropriate to the target engine.

Can knowledge-work-plugins automatically extract text from PDFs?

No. According to pdf-viewer/skills/view-pdf/SKILL.md, the view-pdf skill provides an interactive viewer for annotation, highlighting, and form filling, but it does not perform text extraction. Text extraction is delegated to the native Read tool; the skill focuses on visual manipulation and markup workflows.

How does the validate-data skill handle different types of analytic content?

The validate-data skill accepts diverse inputs including SQL queries (for logic verification), chart screenshots (for visual assessment), Markdown/Notebook files, and natural language summaries. It applies a taxonomy of checks covering methodology correctness, calculation accuracy, bias detection, and presentation quality, returning a structured confidence assessment with specific caveats and improvement suggestions.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how anthropics/knowledge-work-plugins works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →