tabulapdf: An R Package to Extract Tables from PDF Documents
2024-08-25Unverified0· sign in to hype
Mauricio Vargas Sepúlveda, Thomas J. Leeper, Tom Paskhalis, Manuel Aristarán, Jeremy B. Merrill, Mike Tigas
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection with a computer mouse for data retrieval.