SOTAVerified

Natural Language Inference

Natural language inference (NLI) is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".

Example:

| Premise | Label | Hypothesis | | --- | ---| --- | | A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping. | | An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. | | A soccer game with multiple males playing. | entailment | Some men are playing a sport. |

Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches. Benchmark datasets used for NLI include SNLI, MultiNLI, SciTail, among others. You can get hands-on practice on the SNLI task by following this d2l.ai chapter.

Further readings:

Papers

Showing 17011750 of 1961 papers

TitleStatusHype
SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language InferenceCode0
Shortcut-Stacked Sentence Encoders for Multi-Domain InferenceCode0
Uncovering Values: Detecting Latent Moral Content from Natural Language with Explainable and Non-Trained MethodsCode0
SICK-NL: A Dataset for Dutch Natural Language InferenceCode0
PARMA: A Predicate Argument AlignerCode0
SICKNL: A Dataset for Dutch Natural Language InferenceCode0
Ckylark: A More Robust PCFG-LA ParserCode0
An Evaluation Framework for Mapping News Headlines to Event Classes in a Knowledge GraphCode0
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language ModelsCode0
Introducing a Lexicon of Verbal Polarity Shifters for EnglishCode0
AttesTable at SemEval-2021 Task 9: Extending Statement Verification with Tables for Unknown Class, and Semantic Evidence FindingCode0
Patient Trajectory Prediction: Integrating Clinical Notes with TransformersCode0
Investigating Multi-source Active Learning for Natural Language InferenceCode0
Investigating Reasons for Disagreement in Natural Language InferenceCode0
Investigating semantic subspaces of Transformer sentence embeddings through linear structural probingCode0
Investigating the Robustness of Modelling Decisions for Few-Shot Cross-Topic Stance Detection: A Preregistered StudyCode0
Edinburgh Clinical NLP at SemEval-2024 Task 2: Fine-tune your model unless you have access to GPT-4Code0
ECon: On the Detection and Resolution of Evidence ConflictsCode0
Character-level Intra Attention Network for Natural Language InferenceCode0
Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial TrainingCode0
A CCG-based Compositional Semantics and Inference System for ComparativesCode0
EconNLI: Evaluating Large Language Models on Economics ReasoningCode0
Is Modularity Transferable? A Case Study through the Lens of Knowledge DistillationCode0
Is "My Favorite New Movie" My Favorite Movie? Probing the Understanding of Recursive Noun PhrasesCode0
Simple and Effective Text Matching with Richer Alignment FeaturesCode0
Is Prompt-Based Finetuning Always Better than Vanilla Finetuning? Insights from Cross-Lingual Language UnderstandingCode0
Issues with Entailment-based Zero-shot Text ClassificationCode0
End-to-End Bias Mitigation by Modelling Biases in CorporaCode0
Downstream Trade-offs of a Family of Text WatermarksCode0
A Neural-Symbolic Approach to Natural Language UnderstandingCode0
Jack the Reader -- A Machine Reading FrameworkCode0
Ecologically Valid Explanations for Label Variation in NLICode0
Persian Natural Language Inference: A Meta-learning approachCode0
PerspectroScope: A Window to the World of Diverse PerspectivesCode0
Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language ModelsCode0
Characterizing and Measuring Linguistic Dataset DriftCode0
Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent NetworksCode0
Character-based Neural Networks for Sentence Pair ModelingCode0
Plausible Extractive Rationalization through Semi-Supervised Entailment SignalCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language InferenceCode0
Certified Robustness to Adversarial Word SubstitutionsCode0
Attentive Convolution: Equipping CNNs with RNN-style Attention MechanismsCode0
A Study of fastText Word Embedding Effects in Document Classification in Bangla LanguageCode0
SILT: Efficient transformer training for inter-lingual inferenceCode0
Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive SummarizationCode0
Causal Abstractions of Neural NetworksCode0
Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag RepresentationsCode0
For Generated Text, Is NLI-Neutral Text the Best Text?Code0
Thieves on Sesame Street! Model Extraction of BERT-based APIsCode0
Show:102550
← PrevPage 35 of 40Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnitedSynT5 (3B)% Test Accuracy94.7Unverified
2UnitedSynT5 (335M)% Test Accuracy93.5Unverified
3EFL (Entailment as Few-shot Learner) + RoBERTa-large% Test Accuracy93.1Unverified
4Neural Tree Indexers for Text Understanding% Test Accuracy93.1Unverified
5RoBERTa-large+Self-Explaining% Test Accuracy92.3Unverified
6RoBERTa-large + self-explaining layer% Test Accuracy92.3Unverified
7CA-MTL% Test Accuracy92.1Unverified
8SemBERT% Test Accuracy91.9Unverified
9MT-DNN-SMARTLARGEv0% Test Accuracy91.7Unverified
10MT-DNN-SMART_100%ofTrainingDataDev Accuracy91.6Unverified
#ModelMetricClaimedVerifiedStatus
1Vega v2 6B (KD-based prompt transfer)Accuracy96Unverified
2PaLM 540B (fine-tuned)Accuracy95.7Unverified
3Turing NLR v5 XXL 5.4B (fine-tuned)Accuracy94.1Unverified
4ST-MoE-32B 269B (fine-tuned)Accuracy93.5Unverified
5DeBERTa-1.5BAccuracy93.2Unverified
6MUPPET Roberta LargeAccuracy92.8Unverified
7DeBERTaV3largeAccuracy92.7Unverified
8T5-XXL 11BAccuracy92.5Unverified
9T5-XXL 11B (fine-tuned)Accuracy92.5Unverified
10ST-MoE-L 4.1B (fine-tuned)Accuracy92.1Unverified
#ModelMetricClaimedVerifiedStatus
1UnitedSynT5 (3B)Matched92.6Unverified
2Turing NLR v5 XXL 5.4B (fine-tuned)Matched92.6Unverified
3T5-XXL 11B (fine-tuned)Matched92Unverified
4T5Matched92Unverified
5T5-11BMismatched91.7Unverified
6T5-3BMatched91.4Unverified
7ALBERTMatched91.3Unverified
8DeBERTa (large)Matched91.1Unverified
9Adv-RoBERTa ensembleMatched91.1Unverified
10SMARTRoBERTaDev Matched91.1Unverified