SOTAVerified

Benchmarking

Papers

Showing 14911500 of 5548 papers

TitleStatusHype
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video ModelsCode1
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning0
RMB: Comprehensively Benchmarking Reward Models in LLM AlignmentCode1
LLM-Based Multi-Agent Systems are Scalable Graph Generative ModelsCode2
LoLI-Street: Benchmarking Low-Light Image Enhancement and BeyondCode1
Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection ModelsCode0
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in EnglishCode0
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human FeedbackCode0
A Comparative Analysis on Ethical Benchmarking in Large Language Models0
Enterprise Benchmarks for Large Language Model EvaluationCode0
Show:102550
← PrevPage 150 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified