pdfium provides idiomatic R bindings to Google’s PDFium
engine — the same library that
powers Chrome’s PDF viewer. It has two halves:
- a read surface that exposes vector-path geometry — stroke / fill / Bezier control points / transformation matrices — alongside text, fonts, images, annotations, form fields, attachments, signatures, structure tree, and rendering. The path geometry, in particular, no other CRAN package surfaces today.
- a mutation surface (opt-in via
readwrite = TRUE) that lets you rotate / reorder / merge pages, draw fresh page objects, create and edit annotations, fill form fields, and add file attachments — then save the result.
- Auditing PDF figures (which lines, which colors, which fonts).
- Extracting curves from regulatory filings and scientific publications.
- Building PDF normalization pipelines that need geometry, not just text.
- Filling AcroForm fields programmatically and flattening the result for downstream tooling.
- Authoring programmatic PDFs from vector graphics, JPEG images,
text in the 14 standard fonts or any TrueType / Type1 typeface, and
annotations (think: figure callouts, table reports, annotated source
documents).
/Info-dict writes and on-save encryption are the remaining v0.1.0 gaps — both need upstream PDFium changes that we’ve proposed but Google hasn’t shipped yet. - Anything you’d otherwise drop into Python with
pypdfium2.
See
vignette("mutating-pdfs")
for a walkthrough of the writer surface, and
vignette("comparison")
for how pdfium lines up against pdftools, qpdf, magick,
tabulizer, and staplr.
First CRAN release (0.1.0). The public API is documented on the
pkgdown site and exercised at
100% R coverage; architectural decisions for the release are recorded
under dev/decisions/.
pdfium downloads its libpdfium binary from
bblanchon/pdfium-binaries
at install time. The pinned version lives in tools/pdfium-version.txt.
If your install runs without internet access, set PDFIUM_OFFLINE=1 and
place the matching tarball under inst/pdfium-binaries/ before
installing.
# Release version (once on CRAN):
install.packages("pdfium")
# Development version:
remotes::install_github("humanpred/rpdfium")library(pdfium)
doc <- pdf_doc_open(system.file("extdata", "fixtures", "minimal.pdf",
package = "pdfium"
))
pdf_page_count(doc)
pdf_doc_close(doc)More examples ship in the vignettes
(vignette("getting-started", package = "pdfium"), etc.) and on the
pkgdown site.
pdfium is MIT-licensed. The bundled libpdfium binary is BSD-3-Clause
and is not distributed in the source tarball — see
LICENSE.md and
dev/decisions/ADR-003-binary-distribution.md.