# What Data Can Knowledge-Work-Plugins Process: SQL Tables, Files, and PDFs

> Explore what data knowledge-work-plugins can process, including SQL tables, local files like CSV and Excel, Pandas DataFrames, and PDF documents. Unlock versatile data handling capabilities.

- Repository: [Anthropic/knowledge-work-plugins](https://github.com/anthropics/knowledge-work-plugins)
- Tags: how-to-guide
- Published: 2026-05-25

---

**Knowledge-work-plugins skills process structured SQL tables from any major data warehouse, local flat files (CSV, Excel, Parquet, JSON), in-memory Pandas DataFrames, analytic narratives, and PDF documents through a unified MCP connector abstraction.**

The **knowledge-work-plugins** repository from Anthropic provides a modular suite of skills that transform natural language prompts into concrete data operations. Each skill targets specific data types and formats—from relational database tables to interactive PDF annotations—enabling seamless data workflows without boilerplate code.

## Structured Relational Data and SQL Tables

The skills in `data/skills/explore-data/` and `data/skills/write-query/` handle structured relational data across dozens of SQL dialects.

### Profiling Warehouse Tables

The `explore-data` skill ingests SQL tables from any dialect supported by the MCP server—including PostgreSQL, Snowflake, BigQuery, Redshift, MySQL, DuckDB, and SQLite—and generates comprehensive data profiles. According to the implementation in [`data/skills/explore-data/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/data/skills/explore-data/SKILL.md), the skill returns row/column counts, null rates, cardinality metrics, distribution statistics, and quality flags for dimensional, metric, temporal, and identifier columns.

### Generating Optimized SQL

The `write-query` skill, documented in [`data/skills/write-query/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/data/skills/write-query/SKILL.md), parses natural language requests against user-described relational schemas. It discovers warehouse schema metadata when available and emits dialect-specific SQL with common table expressions (CTEs), partition pruning hints, and performance notes tailored to the target engine (Snowflake, BigQuery, etc.).

## Flat Files and Local Data Formats

Beyond warehouse connections, the `explore-data` skill processes local file formats including **CSV**, **Excel**, **Parquet**, and **JSON**. The skill applies the same profiling algorithm used for SQL tables, treating flat files as ephemeral tables to generate cardinality reports and quality assessments without requiring a persistent database connection.

## Analytic Narratives and Validation Targets

The `validate-data` skill, defined in [`data/skills/validate-data/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/data/skills/validate-data/SKILL.md), processes unstructured and semi-structured analytic artifacts rather than raw data. It accepts:

- SQL queries (for methodology review)
- Chart screenshots (for visual accuracy checks)
- Markdown and Jupyter Notebook files
- Natural language analytical narratives

The skill returns structured validation reports assessing calculation correctness, bias, presentation quality, and confidence levels with actionable improvement suggestions.

## In-Memory Data Structures for Visualization

The `data-visualization` skill, implemented in [`data/skills/data-visualization/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/data/skills/data-visualization/SKILL.md), operates on **Pandas DataFrames** (in-memory), CSV/Parquet files, or query results supplied by upstream skills. It generates ready-to-run Python code using matplotlib, seaborn, or plotly, complete with chart-type recommendations, design principles, and accessibility checklists.

## Schema Metadata and Context Extraction

The `data-context-extractor` skill introspects any data source available through the MCP server to extract structured metadata. As documented in [`data/skills/data-context-extractor/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/data/skills/data-context-extractor/SKILL.md), it returns table definitions, column comments, relationship mappings, and domain templates that downstream skills use for prompt engineering and automated query generation.

## PDF Documents and Interactive Reports

The `view-pdf` skill in [`pdf-viewer/skills/view-pdf/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/pdf-viewer/skills/view-pdf/SKILL.md) handles document-centric workflows. It processes local PDF files and HTTPS-served PDFs (from sources like arXiv or Zenodo) through an interactive viewer API. Users can highlight text, add annotations, fill forms, and place image stamps. Note that text extraction is delegated to the native `Read` tool; the skill focuses on visual interaction and markup rather than parsing.

## The MCP Connector Architecture

All skills remain **agnostic to the underlying storage engine** by relying on the Multi-Connector Platform (MCP) abstraction. This architecture allows the same skill logic to communicate with PostgreSQL, Snowflake, BigQuery, MySQL, DuckDB, and SQLite without code changes. The skill delegates connector discovery to the repository’s [`CONNECTORS.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/CONNECTORS.md) reference, then invokes the appropriate driver (SQL, pandas, or PDF viewer API) based on the detected data type.

## Practical Code Examples

Below are the natural language slash commands used to invoke each skill type, with representative outputs.

### Profiling a SQL Table

```text
/explore-data sales.orders

```

**Sample response:**

```markdown

## Data Profile: sales.orders

### Overview

- Rows: 2,340,891
- Columns: 23 (8 dimensions, 6 metrics, 4 dates, 5 IDs)
- Date range: 2021‑03‑15 to 2024‑01‑22

### Column Details

| Column       | Type      | Null % | Distinct % | Top Values (freq) |
|--------------|-----------|--------|------------|-------------------|
| order_id     | Identifier| 0.0    | 100.0%     | —                 |
| status       | Dimension | 0.2    | 12.4%      | completed (45 %)  |
| revenue_usd  | Metric    | 0.0    | 99.9%      | —                 |
| created_at   | Temporal  | 0.0    | 99.8%      | —                 |

```

### Generating Dialect-Specific SQL

```text
/write-query Top 10 products by revenue in the last 30 days, using Snowflake syntax

```

**Sample response:**

```sql
-- Snowflake‑optimized query (uses CTEs, partition pruning)
WITH recent_events AS (
    SELECT *
    FROM sales.events
    WHERE event_date >= DATEADD(day, -30, CURRENT_DATE())
),
product_revs AS (
    SELECT
        product_id,
        SUM(revenue_usd) AS total_revenue
    FROM recent_events
    GROUP BY product_id
)
SELECT
    p.product_name,
    pr.total_revenue
FROM product_revs pr
JOIN dim.products p
    ON pr.product_id = p.product_id
ORDER BY pr.total_revenue DESC
LIMIT 10;

```

The skill appends performance notes indicating that Snowflake will automatically prune partitions on `event_date` because the filter appears early in the query.

### Validating Analytic Work

```text
/validate-data Review this quarterly revenue analysis before I send it to exec:

```

**Sample response:**

```markdown

## Validation Report

### Overall Assessment: Share with caveats

- **Methodology** – Correctly aggregates by quarter, but the `order_date` filter was applied after the GROUP BY (should be early for performance).
- **Issues Found**
  1. **High null rate** on `discount_code` (23 %) – flag as “optional”.
  2. **Denominator shifting** – Q4 uses a different fiscal calendar; adjust for consistency.
- **Suggested Improvements**
  - Add a “% of total revenue” column.
  - Document the fiscal calendar change in the narrative.

```

### Creating Data Visualizations

```text
/data-visualization Show a line chart of monthly active users (MAU) over the past year

```

**Sample Python snippet:**

```python
import matplotlib.pyplot as plt
import pandas as pd

# Assume `df` already contains columns: month (datetime), mau (int)

plt.style.use('seaborn-v0_8-whitegrid')
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(df['month'], df['mau'], marker='o', linewidth=2, color='#4C72B0')
ax.set_title('Monthly Active Users – Last 12 Months', fontweight='bold')
ax.set_xlabel('Month')
ax.set_ylabel('MAU')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('mau_line_chart.png', dpi=150)

```

### Interacting with PDFs

```text
/view-pdf /files/contracts/2024‑service‑agreement.pdf

```

The skill initiates an interactive workflow:
1. `display_pdf` returns a `viewUUID`
2. `interact` → `highlight_text` marks specific clauses (e.g., "Effective Date")
3. `interact` → `add_annotations` inserts stamps like "CONFIDENTIAL"
4. `interact` → `get_screenshot` confirms visual placement

## Summary

- **knowledge-work-plugins** processes relational data via the `explore-data` and `write-query` skills, supporting PostgreSQL, Snowflake, BigQuery, MySQL, DuckDB, and SQLite.
- Flat files (CSV, Excel, Parquet, JSON) are profiled using the same engine as SQL tables through the MCP connector abstraction.
- The `validate-data` skill targets analytic artifacts—SQL queries, charts, Markdown files, and narratives—for methodology and bias checking.
- Visualization workflows accept Pandas DataFrames and generate production-ready Python code (matplotlib/seaborn/plotly) via the `data-visualization` skill.
- PDF documents are handled interactively by the `view-pdf` skill, allowing annotation and markup without text extraction.
- All data operations route through the MCP (Multi-Connector Platform) layer, ensuring storage-engine portability.

## Frequently Asked Questions

### What file formats can knowledge-work-plugins read from local storage?

The `explore-data` skill reads **CSV**, **Excel**, **Parquet**, and **JSON** files from local storage, applying the same profiling logic (null rates, cardinality, distribution stats) used for SQL warehouse tables. These files are treated as ephemeral tables during the analysis phase.

### Which SQL dialects does the write-query skill support?

The `write-query` skill supports all major data warehouse dialects including **PostgreSQL**, **Snowflake**, **BigQuery**, **Redshift**, **MySQL**, **DuckDB**, and **SQLite**. It automatically detects dialect-specific syntax requirements and generates optimized queries with CTEs and partition pruning hints appropriate to the target engine.

### Can knowledge-work-plugins automatically extract text from PDFs?

No. According to [`pdf-viewer/skills/view-pdf/SKILL.md`](https://github.com/anthropics/knowledge-work-plugins/blob/main/pdf-viewer/skills/view-pdf/SKILL.md), the `view-pdf` skill provides an **interactive viewer** for annotation, highlighting, and form filling, but it does not perform text extraction. Text extraction is delegated to the native `Read` tool; the skill focuses on visual manipulation and markup workflows.

### How does the validate-data skill handle different types of analytic content?

The `validate-data` skill accepts diverse inputs including SQL queries (for logic verification), chart screenshots (for visual assessment), Markdown/Notebook files, and natural language summaries. It applies a taxonomy of checks covering methodology correctness, calculation accuracy, bias detection, and presentation quality, returning a structured confidence assessment with specific caveats and improvement suggestions.