A drop-in React component that renders a PDF (including pre-signed S3 URLs) and highlights all occurrences of one or more search terms with pixel-accurate overlays and hover tooltips. Built on top of react-pdf (pdf.js).
-
<PdfHighlighter />component:- Displays a PDF inline in your React app.
- Highlights matches for one or many search terms.
- Ultra-precise geometry using the rendered text layer + DOM Ranges (correct across split spans, kerning, ligatures, and line wraps).
- Transparent yellow (or per-term colors), tooltip on hover.
- Works with pre-signed S3 URLs (1-hour validity etc.).
npm i react-pdf pdfjs-distNode ≥ 18 recommended.
react-pdf must use the same pdf.js worker version you installed.
// e.g. in your PdfHighlighter.jsx
import { pdfjs } from "react-pdf";
import workerSrc from "pdfjs-dist/build/pdf.worker.mjs?url";
pdfjs.GlobalWorkerOptions.workerSrc = workerSrc;import "react-pdf/dist/Page/AnnotationLayer.css";
import "react-pdf/dist/Page/TextLayer.css";import PdfHighlighter from "./components/PdfHighlighter";
export default function Example() {
const url = "/docs/sample.pdf"; // or pre-signed S3 URL
const terms = ["submit", "Virginia Tech"]; // one or many
return <PdfHighlighter url={url} searchTexts={terms} scale={1.25} />;
}That’s it—highlights should render inline.
<PdfHighlighter
url={string} // PDF url (supports S3 pre-signed)
searchText={string} // single term (optional)
searchTexts={string[]} // multiple terms (optional)
scale={number} // zoom factor; default 1.25
/>Notes:
- Pass either
searchTextorsearchTexts(the latter wins if both provided). - Terms are trimmed and de-duplicated.
- Matching is case-insensitive by default and tolerant of line breaks and end-of-line hyphens (e.g.,
sub-\nmit→ “submit”).
-
react-pdf:
- Streams bytes to the pdf.js worker, renders the page graphics onto a
<canvas>. - Builds a text layer (
.react-pdf__Page__textContent) of absolutely positioned<span>s so text is selectable.
- Streams bytes to the pdf.js worker, renders the page graphics onto a
-
Our overlay:
-
For each page, we add an absolutely positioned overlay
<div>above the text layer. -
We walk the text layer DOM (text nodes in visual order) to build a virtual page string and an index mapping each virtual character →
{node, offset}. -
For each search term:
- We normalize whitespace (so multi-line queries work) and find all matches.
- For each match we create a DOM
Range, callrange.getClientRects()to obtain the actual on-screen rectangles, and draw translucent<div>s at those coordinates.
-
A
MutationObserverrecomputes highlights when the text layer finishes rendering or reflows (e.g., after zoom).
-
This yields pixel-perfect highlights without guessing character widths.
Alignment is identical regardless of origin. Ensure:
-
Content-Type: application/pdf -
Bucket CORS allows your app origin (GET/HEAD).
-
Optional response headers when generating the URL:
response-content-type=application/pdfresponse-content-disposition=inline
Example CORS rule (minimal):
[
{
"AllowedOrigins": ["https://your-app.example"],
"AllowedMethods": ["GET", "HEAD"],
"AllowedHeaders": ["*"],
"ExposeHeaders": [
"Accept-Ranges",
"Content-Range",
"Content-Length",
"ETag"
],
"MaxAgeSeconds": 300
}
]- Color per term — the component hashes each term to an HSL color. Swap the
colorForTerm(term)helper to enforce your design tokens. - Tooltip — shown on hover; easy to replace with your UI library (e.g., Radix Tooltip). The tooltip content is the matched term (React escapes text by default).
This POC is fast for typical docs. For very long PDFs:
- Compute on visible pages only
Add an
IntersectionObserverto run highlight computation for pages within/near the viewport. - Debounce inputs
Debounce
searchText(s)changes (150–250ms) to avoid recomputing on every keystroke. - Avoid unnecessary re-renders
Keep
url,scale, andsearchTextsstable where possible; memoize term arrays.
- Tooltips supplement but do not replace native selection/accessibility of the text layer.
- Highlight overlays use
pointer-events: autoonly for hover; they otherwise allow text selection beneath. - If keyboard tooltip access is required, use a11y-ready tooltip components.
- Requires browsers that support
Range.getClientRects()(all modern evergreen browsers). - Mobile Safari works, but heavy PDFs + many highlights may benefit from the perf tips above.
“Setting up fake worker” / worker version mismatch
Ensure you import the worker from your installed pdfjs-dist and remove any CDN worker lines.
import workerSrc from "pdfjs-dist/build/pdf.worker.mjs?url";
pdfjs.GlobalWorkerOptions.workerSrc = workerSrc;Text loads but “Found 0 matches”
The text layer may not be ready. The component uses a MutationObserver to recompute on text layer population; if you fork it, keep that logic. Also confirm .react-pdf__Page__textContent exists in the DOM.
Highlights offset / misaligned Ensure:
- The overlay is a sibling inside the same per-page wrapper with
position: relative. - You’re importing the correct
TextLayer.css. - You’re not wrapping the page with transforms that change its bounding box.
CORS errors with S3
Set bucket CORS as above and verify the URL is not expired. Avoid file:// URLs; serve via HTTP.
Scanned PDFs (no text) No text layer → nothing to match. You’d need OCR (out of scope for this POC).
- Pre-signed S3 URLs are short-lived; the component simply reads them—it doesn’t store or proxy.
- React escapes tooltip text; do not dangerously set HTML.
- Do not log full pre-signed URLs in production logs.
src/
components/
PdfHighlighter.jsx # the component (overlay + observer + rendering)
PdfHighlighter.css # highlight + tooltip styles
utils/
domHighlights.js # multi-term, line-break/ hyphen-aware DOM geometry
- Add the files above.
- Ensure the worker import and CSS imports exist exactly once (e.g., inside the component).
- Render:
<PdfHighlighter
url={s3Url} // or public path
searchTexts={["submit", "Yes/True", "Virginia Tech"]}
scale={1.25}
/>- (Optional) Wrap in your design system shell and wire search inputs to
searchTexts.
- Case-sensitive / whole-word toggles (regex in the finder).
- Next/Prev navigation & scroll-into-view.
- Persist/export highlight geometry (page, rects) for analytics/audit.
- Compute-on-visibility (IntersectionObserver) for very long docs.