SOTAVerified

Benchmarking

Papers

Showing 1120 of 5548 papers

TitleStatusHype
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement LearningCode7
TaskBench: Benchmarking Large Language Models for Task AutomationCode6
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-XCode5
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and MaintenanceCode5
TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsCode5
The BrowserGym Ecosystem for Web Agent ResearchCode5
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape EstimationCode5
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsCode5
Benchmarking the Myopic Trap: Positional Bias in Information RetrievalCode5
Segment Anything Model for Medical Image Segmentation: Current Applications and Future DirectionsCode5
Show:102550
← PrevPage 2 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified