SOTAVerified

Sentence Completion

Papers

Showing 2130 of 91 papers

TitleStatusHype
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question AnsweringCode1
Task Compass: Scaling Multi-task Pre-training with Task PrefixCode1
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot LearnersCode1
Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ IndividualsCode1
HONEST: Measuring Hurtful Sentence Completion in Language ModelsCode1
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask BenchmarkCode1
GePpeTto Carves Italian into a Language ModelCode1
RoBERTa: A Robustly Optimized BERT Pretraining ApproachCode1
Evaluating Gender Bias in Large Language Models0
KatzBot: Revolutionizing Academic Chatbot for Enhanced CommunicationCode0
Show:102550
← PrevPage 3 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CompassMTL 567M with TailorAccuracy96.1Unverified
2CompassMTL 567MAccuracy95.6Unverified
3DeBERTa-Large 304M (classification-based)Accuracy95.6Unverified
4GPT-4 (10-shot)Accuracy95.3Unverified
5LLaMA3+MoSLoRAAccuracy95Unverified
6LLaMA-2 13B + MixLoRAAccuracy94.7Unverified
7DeBERTa-Large 304MAccuracy94.7Unverified
8Unicorn 11B (fine-tuned)Accuracy93.9Unverified
9LLaMA-3 8B + MixLoRAAccuracy93.3Unverified
10LLaMA-2 7B + MixLoRAAccuracy93.1Unverified