← All Projects
In ProgressDeveloper Tools

PageForge

A client-side document pagination engine built as a TypeScript library for browser rendering.

Phases 0–2b complete
PDF.js and docx-preview rendering pipelines
Paragraph splitting via Range API

The Problem

Browser-based document viewers typically render documents as continuous scrolling content, ignoring page boundaries. For legal tools where page numbers and layout matter — comparing redlines, reviewing productions, annotating exhibits — accurate pagination is essential.

No lightweight, client-side pagination library existed that handled both DOCX and PDF inputs with correct paragraph splitting across page breaks.

Approach

PageForge is a TypeScript library built with Rollup, housed in the DocDiff/RedlineIQ repository as a foundational dependency. The architecture centers on a rendering pipeline that produces paginated output matching the source document's layout as closely as possible.

Phases completed:

  • Phase 0 — PDF.js rendering pipeline with page-accurate output
  • Phase 1 — DOCX rendering via docx-preview with style preservation
  • Phase 2a — Paragraph splitting via the Range API, allowing content to break cleanly across page boundaries
  • Phase 2b — Image-plus-text layout fixes, handling mixed content blocks correctly

Phase 2c (table row splitting with thead cloning) is next — the most complex layout case remaining.

Key constraints: PDF.js CDN is blocked on file:// pages, requiring bundled distribution. SPLITTABLE_TAGS is intentionally constrained to p, li, blockquote, and pre to prevent layout instability on complex elements.

Outcome

Phases 0–2b complete and integrated into RedlineIQ's rendering pipeline. The library correctly paginates multi-page DOCX and PDF documents in the browser with paragraph-level splitting.

What I Learned

The Range API is powerful but requires careful handling of DOM mutations — splitting a paragraph mid-render can invalidate earlier range references in ways that are hard to debug.