SOTAVerified

Benchmarking

Papers

Showing 901925 of 5548 papers

TitleStatusHype
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
Benchmarking Constraint Inference in Inverse Reinforcement LearningCode1
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacousticsCode1
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
Benchmarking large language models for biomedical natural language processing applications and recommendationsCode1
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening ModelsCode1
Benchmarking Counterfactual Image GenerationCode1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning PotentialsCode1
CLoG: Benchmarking Continual Learning of Image Generation ModelsCode1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic MaterialsCode1
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive CareCode1
Benchmarking Compositionality with Formal LanguagesCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
Benchmarking Data Science AgentsCode1
LEMUR Neural Network Dataset: Towards Seamless AutoMLCode1
ClearPose: Large-scale Transparent Object Dataset and BenchmarkCode1
LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation ConversationCode1
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate ModelsCode1
MC-Blur: A Comprehensive Benchmark for Image DeblurringCode1
Large Scale MRI Collection and Segmentation of Cirrhotic LiverCode1
Light Field Salient Object Detection: A Review and BenchmarkCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Show:102550
← PrevPage 37 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified