SOTAVerified

Benchmarking

Papers

Showing 851875 of 5548 papers

TitleStatusHype
Benchmarking Micro-action Recognition: Dataset, Methods, and ApplicationsCode1
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan DatasetsCode1
A Closer Look at Mortality Risk Prediction from ElectrocardiogramsCode1
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?Code1
A Survey of Pathology Foundation Model: Progress and Future DirectionsCode1
CharacterBench: Benchmarking Character Customization of Large Language ModelsCode1
An Empirical Study on Google Research Football Multi-agent ScenariosCode1
A Comprehensive Benchmark for RNA 3D Structure-Function ModelingCode1
IOHanalyzer: Detailed Performance Analyses for Iterative Optimization HeuristicsCode1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule GenerationCode1
EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture SearchCode1
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization HeuristicsCode1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for ElectromyographyCode1
End-to-end Knowledge Retrieval with Multi-modal QueriesCode1
An Evaluation Dataset for Intent Classification and Out-of-Scope PredictionCode1
Benchmarking Batch Deep Reinforcement Learning AlgorithmsCode1
Benchmarking machine learning models on multi-centre eICU critical care datasetCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
Ego-Body Pose Estimation via Ego-Head Pose EstimationCode1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive CareCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 MinutesCode1
Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and BenchmarkingCode1
Benchmarking Low-Shot Robustness to Natural Distribution ShiftsCode1
Show:102550
← PrevPage 35 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified