WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context

2020-04-30EACL 2021Code Available0· sign in to hype

Anna Breit, Artem Revenko, Kiamehr Rezaee, Mohammad Taher Pilehvar, Jose Camacho-Collados

Code Available — Be the first to reproduce this paper.

Code

github.com/semantic-web-company/wic-tsv
Officialpytorch★ 8

Abstract

We present WiC-TSV, a new multi-domain evaluation benchmark for Word Sense Disambiguation. More specifically, we introduce a framework for Target Sense Verification of Words in Context which grounds its uniqueness in the formulation as a binary classification task thus being independent of external sense inventories, and the coverage of various domains. This makes the dataset highly flexible for the evaluation of a diverse set of models and systems in and across domains. WiC-TSV provides three different evaluation settings, depending on the input signals provided to the model. We set baseline performance on the dataset using state-of-the-art language models. Experimental results show that even though these models can perform decently on the task, there remains a gap between machine and human performance, especially in out-of-domain settings. WiC-TSV data is available at https://competitions.codalab.org/competitions/23683

Tasks

Binary Classification Entity Linking Word Sense Disambiguation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WiC-TSV	Bert-base	Task 1 Accuracy: all	75.3	—	Unverified
WiC-TSV	Unsupervised Bert	Task 1 Accuracy: all	54.4	—	Unverified
WiC-TSV	FastText	Task 1 Accuracy: all	53.7	—	Unverified
WiC-TSV	All true	Task 1 Accuracy: all	50.8	—	Unverified
WiC-TSV	Human	Task 3 Accuracy: all	85.3	—	Unverified

WiC-TSV: An Evaluation Benchmark for Target Sense Verification of Words in Context

Code

Abstract

Tasks

Benchmark Results

Reproductions