SOTAVerified

Multiple-choice

Papers

Showing 841850 of 1107 papers

TitleStatusHype
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer0
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy0
ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models0
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding0
Changing Answer Order Can Decrease MMLU Accuracy0
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks0
What Makes Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types0
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data0
An Improved Traditional Chinese Evaluation Suite for Foundation Model0
Show:102550
← PrevPage 85 of 111Next →

No leaderboard results yet.