XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

2020-05-01EMNLP 2020Code Available1· sign in to hype

Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, Anna Korhonen

Code Available — Be the first to reproduce this paper.

Code

github.com/cambridgeltl/xcopa
OfficialIn papernone★ 105

Abstract

In order to simulate human language capacity, natural language processing systems must be able to reason about the dynamics of everyday situations, including their possible causes and effects. Moreover, they should be able to generalise the acquired world knowledge to new languages, modulo cultural differences. Advances in machine reasoning and cross-lingual transfer depend on the availability of challenging evaluation benchmarks. Motivated by both demands, we introduce Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages, which includes resource-poor languages like Eastern Apur\'imac Quechua and Haitian Creole. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods based on multilingual pretraining and zero-shot fine-tuning falls short compared to translation-based transfer. Finally, we propose strategies to adapt multilingual models to out-of-sample resource-lean languages where only a small corpus or a bilingual dictionary is available, and report substantial improvements over the random baseline. The XCOPA dataset is freely available at github.com/cambridgeltl/xcopa.

Tasks

Cross-Lingual Transfer Translation World Knowledge

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
XCOPA	RoBERTa Large (translate test)	Accuracy	76.05	—	Unverified

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

Code

Abstract

Tasks

Benchmark Results

Reproductions