Context vs Target Word: Quantifying Biases When Applying Models to Lexical Semantic Datasets
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
State-of-the-art contextualized models such as BERT use tasks such as WiC and WSD to evaluate their word-in-context representations. This inherently assumes that performance in these tasks reflect how well a model represents the coupled word and context semantics. This study investigates this assumption by presenting the first quantitative analysis (using probing baselines) on the context-word interaction being tested in major contextual lexical semantic tasks which have dramatically different emphasis. We found that models often exhibit excessive context and target word biases: they solve tasks like WiC as almost purely context classification and rely on target words alone when tackling medical entity linking. In the latter task, where domain reduces ambiguity, context does not improve models and in fact can degrade performance. Our case study on WiC reveals that human subjects do not share models' strong context biases (humans found semantic judgments much more difficult when the target word is missing) and models are learning spurious correlations from context alone. This study demonstrates that models are usually not being tested for word-in-context representations as such in these tasks and results are therefore open to misinterpretation. We recommend our framework as sanity check for context and target word biases of future task design and application in lexical semantics.