How to Perform HTML to PDF Node.js Conversion Using Puppeteer

Puppeteer converts HTML to PDF by launching a headless Chromium browser and invoking the Page.printToPDF Chrome DevTools Protocol command via the page.pdf() method, returning a byte stream that can be saved to disk or streamed elsewhere.

Converting HTML documents to PDF format is a common requirement for reporting, invoicing, and document archiving in Node.js applications. The puppeteer/puppeteer repository provides a robust solution for html to pdf nodejs conversion by automating Chromium through a high-level API. This guide explains the internal architecture and implementation details based on the actual source code.

How Puppeteer Generates PDFs

The PDF generation flow cleanly separates Node-side orchestration from Chrome-side rendering through four distinct stages.

Browser Launch and CDP Communication

The process begins in packages/puppeteer-core/src/node/ChromeLauncher.ts, which constructs the Chromium command line and starts a headless browser instance. This launcher can pass specific PDF-related flags such as --export-tagged-pdf and --generate-pdf-document-outline to enable accessible PDF generation and document bookmarks.

The Page.pdf() Method Implementation

Once the browser is running, the Page class in packages/puppeteer-core/src/cdp/Page.ts (and its BiDi counterpart in packages/puppeteer-core/src/bidi/Page.ts) exposes the high-level API. When you call page.pdf(), these methods forward the supplied options to the CDP Page.printToPDF command and return a Uint8Array containing the raw PDF data.

PDFOptions Configuration

All available knobs for PDF generation are defined in packages/puppeteer-core/src/common/PDFOptions.ts. This file provides type-safe definitions for paper formats, margins, orientation, background rendering, and header/footer templates. These options are validated and marshalled before being sent to Chrome's rendering engine.

Basic HTML to PDF Node.js Implementation

The following example demonstrates the standard workflow for converting a web page to PDF:

const puppeteer = require('puppeteer');

(async () => {
  // 1️⃣ Launch a headless Chromium instance
  const browser = await puppeteer.launch();

  // 2️⃣ Open a new page and navigate to the HTML source
  const page = await browser.newPage();
  await page.goto('https://example.com/report.html', {waitUntil: 'networkidle0'});

  // 3️⃣ Generate the PDF
  await page.pdf({
    path: 'report.pdf',
    format: 'A4',
    printBackground: true,
    margin: {top: '20mm', bottom: '20mm', left: '15mm', right: '15mm'}
  });

  await browser.close();
})();

Streaming PDF Data Without File System Writes

For applications that need to process PDFs in memory or send them directly to clients, omit the path option to receive a buffer:

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/report.html');

const pdfBuffer = await page.pdf({
  format: 'Letter',
  printBackground: false
});
// pdfBuffer is a Uint8Array suitable for HTTP responses or database storage
await browser.close();

Advanced PDF Options: Headers, Footers, and Layout

Puppeteer supports complex layout requirements through template-based headers and footers defined in PDFOptions.ts:

await page.pdf({
  path: 'invoice.pdf',
  format: 'A4',
  displayHeaderFooter: true,
  headerTemplate: `<span style="font-size:10px;">Invoice #12345</span>`,
  footerTemplate: `<span style="font-size:10px;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>`,
  margin: {top: '30mm', bottom: '30mm'}
});

Summary

  • Puppeteer orchestrates Chromium via the Chrome DevTools Protocol to render HTML as PDF
  • The page.pdf() method in cdp/Page.ts and bidi/Page.ts forwards options to Chrome's Page.printToPDF endpoint
  • PDFOptions.ts provides type-safe configuration for margins, format, backgrounds, and templates
  • PDF data returns as a Uint8Array for flexible handling in Node.js applications

Frequently Asked Questions

How does Puppeteer convert HTML to PDF internally?

Puppeteer launches Chromium via ChromeLauncher.ts and sends a Page.printToPDF CDP command through the page.pdf() method implemented in cdp/Page.ts. Chrome renders the page using its print engine and returns a binary PDF payload that Puppeteer wraps in a Uint8Array.

Can I generate PDFs from HTML without saving to disk in Node.js?

Yes. Omit the path option in your PDFOptions to receive a Uint8Array buffer containing the PDF data. This buffer can be streamed to HTTP responses, stored in databases, or processed in memory without touching the file system.

What PDF formatting options does Puppeteer support?

According to PDFOptions.ts, you can specify paper size (format), orientation (landscape), margins, CSS background rendering (printBackground), and custom HTML headers/footers using headerTemplate and footerTemplate with special classes like pageNumber and totalPages.

Does Puppeteer support accessible PDFs with document outlines?

Yes. When launching the browser, ChromeLauncher.ts can pass flags like --export-tagged-pdf and --generate-pdf-document-outline to Chromium, enabling tagged PDF generation and bookmarks based on the HTML document structure.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →