SOTAVerified

Benchmarking

Papers

Showing 1120 of 5548 papers

TitleStatusHype
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-InferenceCode7
TaskBench: Benchmarking Large Language Models for Task AutomationCode6
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and MaintenanceCode5
Benchmarking the Myopic Trap: Positional Bias in Information RetrievalCode5
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape EstimationCode5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsCode5
The BrowserGym Ecosystem for Web Agent ResearchCode5
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative ModelsCode5
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsCode5
Segment Anything Model for Medical Image Segmentation: Current Applications and Future DirectionsCode5
Show:102550
← PrevPage 2 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified