Implementing a Robust PDF Viewer in React for Large Documents: Best Practices and Code Examples

Use PDF.js with Web Workers for off-thread parsing, render pages inside a virtualized list with LRU caching, and implement chunked loading to display large PDFs without blocking the UI or exhausting memory.

Displaying multi-hundred-page PDFs in a browser-based React application presents significant performance challenges. When implementing a robust PDF viewer in React, developers must balance rendering quality with memory constraints and responsive user interactions. This guide examines architectural patterns derived from the facebook/react ecosystem, providing production-ready implementations for handling large documents efficiently.

Core Architecture for High-Performance PDF Rendering

Building a viewer that scales to thousands of pages requires offloading computation from the main thread and rendering only visible content.

Offloading Parsing to Web Workers

PDF parsing is CPU-intensive and can freeze the UI if executed on the main thread. The pdfjs-dist library provides a dedicated worker script that handles document parsing in parallel.

Configure the worker in your entry point or viewer component:

import { pdfjs } from 'pdfjs-dist';

pdfjs.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.min.js`;

This configuration ensures that heavy parsing operations occur in a Web Worker, keeping scroll and click interactions fluid.

Lazy Loading and Virtualization

Rendering every page of a large PDF simultaneously exhausts memory and degrades performance. Instead, implement virtualization to recycle DOM nodes and only render pages within the viewport.

Use react-window or react-virtualized to create a windowed list where each item represents a single PDF page. This approach ensures that even a 10,000-page document only maintains a handful of canvas elements in the DOM at any moment.

Building the Viewer Component

The following implementation in src/components/PdfViewer.tsx demonstrates a complete, production-ready viewer that combines virtualization, caching, and error handling.

import React, {
  useEffect,
  useRef,
  useState,
  useCallback,
  useMemo,
} from 'react';
import { FixedSizeList as List } from 'react-window';
import { pdfjs, PDFDocumentProxy, PDFPageProxy } from 'pdfjs-dist';
import LRU from 'lru-cache';

// Configure PDF.js worker
pdfjs.GlobalWorkerOptions.workerSrc = `//cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.min.js`;

// LRU cache for decoded pages (max 20 pages)
const pageCache = new LRU<number, HTMLCanvasElement>({ max: 20 });

type Props = {
  /** URL of the PDF to display */
  src: string;
};

export const PdfViewer: React.FC<Props> = ({ src }) => {
  const [pdf, setPdf] = useState<PDFDocumentProxy | null>(null);
  const [numPages, setNumPages] = useState(0);
  const [error, setError] = useState<string | null>(null);
  const containerRef = useRef<HTMLDivElement>(null);
  const scale = 1.5; // Adjust per zoom UI

  // Load PDF once
  useEffect(() => {
    let cancelled = false;
    pdfjs
      .getDocument({
        url: src,
        cMapUrl: `//cdn.jsdelivr.net/npm/pdfjs-dist@${pdfjs.version}/cmaps/`,
        cMapPacked: true,
      })
      .promise.then((doc) => {
        if (!cancelled) {
          setPdf(doc);
          setNumPages(doc.numPages);
        }
      })
      .catch((e) => !cancelled && setError(e.message));
    return () => {
      cancelled = true;
    };
  }, [src]);

  // Render a single page into a canvas (cached)
  const renderPage = useCallback(
    async (pageNumber: number, canvas: HTMLCanvasElement) => {
      // Return cached canvas if present
      const cached = pageCache.get(pageNumber);
      if (cached) {
        const ctx = canvas.getContext('2d');
        if (ctx) ctx.drawImage(cached, 0, 0);
        return;
      }

      if (!pdf) return;
      const page: PDFPageProxy = await pdf.getPage(pageNumber);
      const viewport = page.getViewport({ scale });
      canvas.width = viewport.width;
      canvas.height = viewport.height;

      const renderContext = {
        canvasContext: canvas.getContext('2d')!,
        viewport,
      };
      await page.render(renderContext).promise;
      // Store a clone for fast reuse
      const clone = document.createElement('canvas');
      clone.width = viewport.width;
      clone.height = viewport.height;
      clone.getContext('2d')!.drawImage(canvas, 0, 0);
      pageCache.set(pageNumber, clone);
    },
    [pdf, scale]
  );

  // Row renderer for react-window
  const Row = ({
    index,
    style,
  }: {
    index: number;
    style: React.CSSProperties;
  }) => {
    const pageNumber = index + 1;
    const canvasRef = useRef<HTMLCanvasElement>(null);

    useEffect(() => {
      const canvas = canvasRef.current;
      if (!canvas) return;
      // Show a loading placeholder while the page renders
      canvas.getContext('2d')?.clearRect(0, 0, canvas.width, canvas.height);
      renderPage(pageNumber, canvas).catch(() => {
        // optional: draw error placeholder
      });
    }, [pageNumber, renderPage]);

    return (
      <div style={style}>
        <canvas ref={canvasRef} style={{ width: '100%' }} />
      </div>
    );
  };

  if (error) return <div role="alert">Failed to load PDF: {error}</div>;
  if (!pdf) return <div>Loading document…</div>;

  // Height estimate: use a constant or compute based on first page
  const PAGE_HEIGHT = 800; // px – adjust for your design

  return (
    <div ref={containerRef} style={{ height: '100%', overflow: 'auto' }}>
      <List
        height={containerRef.current?.clientHeight ?? 600}
        itemCount={numPages}
        itemSize={PAGE_HEIGHT}
        width="100%"
      >
        {Row}
      </List>
    </div>
  );
};

Key architectural decisions in src/components/PdfViewer.tsx include:

  • Worker configurationpdfjs.GlobalWorkerOptions.workerSrc moves parsing off the UI thread.
  • Lazy instantiation – Each Row component renders only when visible within the virtualized list.
  • Canvas recycling – The Row component receives a fresh canvas element from the virtualized list, but the renderPage function draws from the LRU cache when available.

Memory Optimization Strategies

Fine-tuning memory management prevents browser crashes when users navigate large technical manuals or scanned archives.

LRU Caching in src/utils/pageCache.ts

Repeatedly decoding the same page when a user scrolls back up wastes CPU cycles. Create a dedicated cache module in src/utils/pageCache.ts to store cloned canvas elements:

import LRU from 'lru-cache';

export const pageCache = new LRU<number, HTMLCanvasElement>({
  max: 20,
  ttl: 1000 * 60 * 5, // 5 minutes
});

Limiting the cache to 20 pages bounds memory usage while providing instant back-scrolling for recently viewed content. The cache keys pages by number, though you may extend this to include zoom level for multi-scale applications.

Zoom and DPI Management

High-DPI displays can exponentially increase memory consumption if you render at native resolution for every zoom level. Debounce zoom changes and limit the maximum scale factor to prevent excessive pixel buffers:

const [scale, setScale] = useState(1.5);
const debouncedScale = useDebounce(scale, 300);

// In renderPage:
const viewport = page.getViewport({ scale: debouncedScale });

Vector-based scaling via the scale parameter in pdfjs ensures crisp output without creating oversized pixel buffers that could crash the browser tab.

Accessibility and Error Handling

A production viewer must support assistive technologies and degrade gracefully when files fail to load.

Text Layer Implementation

Screen readers require selectable text. Enable the text layer by appending an invisible overlay in src/components/PdfViewer.tsx:

const textLayerDiv = document.createElement('div');
textLayerDiv.className = 'pdf-text-layer';
canvas.parentNode?.appendChild(textLayerDiv);

const textContent = await page.getTextContent();
await pdfjs.renderTextLayer({
  textContent,
  container: textLayerDiv,
  viewport: page.getViewport({ scale }),
  textDivs: [],
}).promise;

Accompany this with CSS in src/styles/pdfViewer.css to position the layer absolutely without blocking pointer events:

.pdf-text-layer {
  position: absolute;
  top: 0;
  left: 0;
  pointer-events: none;
  opacity: 0;
}

Graceful Degradation

Handle corrupted PDFs in src/hooks/usePdfDocument.ts by catching errors and exposing a retry mechanism:

export const usePdfDocument = (src: string) => {
  const [pdf, setPdf] = useState<PDFDocumentProxy | null>(null);
  const [error, setError] = useState<string | null>(null);
  const [retryCount, setRetryCount] = useState(0);

  useEffect(() => {
    let cancelled = false;
    setError(null);
    
    pdfjs.getDocument({ url: src }).promise
      .then(doc => { if (!cancelled) setPdf(doc); })
      .catch(err => { if (!cancelled) setError(err.message); });
      
    return () => { cancelled = true; };
  }, [src, retryCount]);

  const retry = () => setRetryCount(c => c + 1);
  
  return { pdf, error, retry };
};

This pattern ensures users see a meaningful error message and a download fallback rather than a blank screen when pdfjs encounters a malformed file.

Summary

  • Offload parsing to a Web Worker via pdfjs.GlobalWorkerOptions.workerSrc to prevent UI freezing.
  • Virtualize the page list using react-window to limit DOM nodes to visible items only.
  • Cache rendered canvases in an LRU store defined in src/utils/pageCache.ts to speed up backward scrolling while capping memory at 20 pages.
  • Implement chunked loading by rendering pages only when they enter the viewport, coupled with debounced zoom controls to manage DPI.
  • Add accessibility through invisible text layers in src/components/PdfViewer.tsx and ARIA landmarks, and handle errors gracefully with retry logic in src/hooks/usePdfDocument.ts.

Frequently Asked Questions

How do I prevent the browser from freezing when loading a 500-page PDF?

Move PDF parsing to a Web Worker by setting pdfjs.GlobalWorkerOptions.workerSrc, and wrap your page list in a virtualized component like react-window. This combination ensures only 5–10 pages exist in the DOM at any time, and heavy decoding happens off the main thread.

What is the best way to handle zoom functionality without memory leaks?

Debounce zoom state changes (300ms) and pass the scale factor to page.getViewport({scale}). When the scale changes, clear the LRU cache or key it by [pageNumber, scale] to prevent stale canvases from accumulating. Limit maximum scale to 3× to avoid creating oversized pixel buffers.

How can I make the PDF content accessible to screen readers?

Enable the text layer by calling pdfjs.renderTextLayer() after rendering the canvas. Append the resulting div to the page container with absolute positioning and pointer-events: none. This invisible overlay contains selectable text nodes that screen readers can parse while the visual canvas remains intact.

Should I use Canvas or SVG for rendering PDF pages in React?

Use Canvas for general-purpose viewing because it offers faster rasterization and lower memory overhead, especially important for large documents. Reserve SVG for use cases requiring individual text selection or DOM manipulation of vector elements, but note that SVG performance degrades with complex page geometries.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →