SOTAVerified

tabulapdf: An R Package to Extract Tables from PDF Documents

2024-08-25Unverified0· sign in to hype

Mauricio Vargas Sepúlveda, Thomas J. Leeper, Tom Paskhalis, Manuel Aristarán, Jeremy B. Merrill, Mike Tigas

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection with a computer mouse for data retrieval.

Tasks

Reproductions