SOTAVerified

Benchmarking

Papers

Showing 20012010 of 5548 papers

TitleStatusHype
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective0
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentationsCode2
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual GenerationCode3
BeHonest: Benchmarking Honesty in Large Language ModelsCode1
Benchmarking Unsupervised Online IDS for Masquerade Attacks in CANCode0
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and AtmosphereCode0
Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications0
Exploring and Benchmarking the Planning Capabilities of Large Language Models0
Show:102550
← PrevPage 201 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified