Skip to content

humanpred/rpdfium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

213 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdfium

R-CMD-check Codecov test coverage Lifecycle: experimental CRAN status Codecov test coverage

pdfium provides idiomatic R bindings to Google’s PDFium engine — the same library that powers Chrome’s PDF viewer. It has two halves:

  • a read surface that exposes vector-path geometry — stroke / fill / Bezier control points / transformation matrices — alongside text, fonts, images, annotations, form fields, attachments, signatures, structure tree, and rendering. The path geometry, in particular, no other CRAN package surfaces today.
  • a mutation surface (opt-in via readwrite = TRUE) that lets you rotate / reorder / merge pages, draw fresh page objects, create and edit annotations, fill form fields, and add file attachments — then save the result.

What it is for

  • Auditing PDF figures (which lines, which colors, which fonts).
  • Extracting curves from regulatory filings and scientific publications.
  • Building PDF normalization pipelines that need geometry, not just text.
  • Filling AcroForm fields programmatically and flattening the result for downstream tooling.
  • Authoring programmatic PDFs from vector graphics, JPEG images, text in the 14 standard fonts or any TrueType / Type1 typeface, and annotations (think: figure callouts, table reports, annotated source documents). /Info-dict writes and on-save encryption are the remaining v0.1.0 gaps — both need upstream PDFium changes that we’ve proposed but Google hasn’t shipped yet.
  • Anything you’d otherwise drop into Python with pypdfium2.

See vignette("mutating-pdfs") for a walkthrough of the writer surface, and vignette("comparison") for how pdfium lines up against pdftools, qpdf, magick, tabulizer, and staplr.

Status

First CRAN release (0.1.0). The public API is documented on the pkgdown site and exercised at 100% R coverage; architectural decisions for the release are recorded under dev/decisions/.

Installation

pdfium downloads its libpdfium binary from bblanchon/pdfium-binaries at install time. The pinned version lives in tools/pdfium-version.txt. If your install runs without internet access, set PDFIUM_OFFLINE=1 and place the matching tarball under inst/pdfium-binaries/ before installing.

# Release version (once on CRAN):
install.packages("pdfium")

# Development version:
remotes::install_github("humanpred/rpdfium")

Example

library(pdfium)

doc <- pdf_doc_open(system.file("extdata", "fixtures", "minimal.pdf",
  package = "pdfium"
))
pdf_page_count(doc)
pdf_doc_close(doc)

More examples ship in the vignettes (vignette("getting-started", package = "pdfium"), etc.) and on the pkgdown site.

License

pdfium is MIT-licensed. The bundled libpdfium binary is BSD-3-Clause and is not distributed in the source tarball — see LICENSE.md and dev/decisions/ADR-003-binary-distribution.md.

About

An R binding for the Google pdfium library

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
LICENSE.md

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors