Instilling Type Knowledge in Language Models via Multi-Task QA

2022-04-28Findings (NAACL) 2022Code Available1· sign in to hype

Shuyang Li, Mukund Sridhar, Chandana Satya Prakash, Jin Cao, Wael Hamza, Julian McAuley

Code Available — Be the first to reproduce this paper.

Code

github.com/amazon-research/wikiwiki-dataset
OfficialIn papernone★ 11

Abstract

Understanding human language often necessitates understanding entities and their place in a taxonomy of knowledge -- their types. Previous methods to learn entity types rely on training classifiers on datasets with coarse, noisy, and incomplete labels. We introduce a method to instill fine-grained type knowledge in language models with text-to-text pre-training on type-centric questions leveraging knowledge base documents and knowledge graphs. We create the WikiWiki dataset: entities and passages from 10M Wikipedia articles linked to the Wikidata knowledge graph with 41K types. Models trained on WikiWiki achieve state-of-the-art performance in zero-shot dialog state tracking benchmarks, accurately infer entity types in Wikipedia articles, and can discover new types deemed useful by human judges.

Tasks

Articles dialog state tracking Knowledge Graphs Vocal Bursts Type Prediction

Instilling Type Knowledge in Language Models via Multi-Task QA

Code

Abstract

Tasks

Reproductions