PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

2026-03-17Unverified0· sign in to hype

Brandon Smock, Valerie Faucon-Morin, Max Sokolov, Libin Liang, Tayyibah Khanam, Amrit Ramesh, Maury Courtland

Unverified — Be the first to reproduce this paper.

Abstract

Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in developing methods, such as vision-language models (VLMs), that can extract tables directly in their full page or document context. However, progress has been difficult to demonstrate due to a lack of annotated data. To address this, we create a new large-scale dataset, PubTables-v2. PubTables-v2 supports a number of challenging table extraction tasks. Notably, it is the first large-scale benchmark for multi-page table structure recognition. We evaluate several smaller specialized VLMs to establish baseline performance on these tasks. As we show, multi-page table recognition is a key gap in current models' capabilities. Interestingly, we show that introducing an image classifier that predicts when to merge tables across pages can significantly improve performance. Data, code, and models will be released at https://huggingface.co/datasets/kensho/PubTables-v2.

PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

Abstract

Reproductions