SOTAVerified

Benchmarking

Papers

Showing 901925 of 5548 papers

TitleStatusHype
An Image Dataset for Benchmarking Recommender Systems with Raw PixelsCode1
Comprehensive benchmarking of large language models for RNA secondary structure predictionCode1
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
ERASE: Benchmarking Feature Selection Methods for Deep Recommender SystemsCode1
AD-LLM: Benchmarking Large Language Models for Anomaly DetectionCode1
LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property PredictionCode1
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening ModelsCode1
Benchmarking Counterfactual Image GenerationCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wildCode1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality RobustnessCode1
ESB: A Benchmark For Multi-Domain End-to-End Speech RecognitionCode1
Benchmarking MRI Reconstruction Neural Networks on Large Public DatasetsCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetryCode1
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person PerspectiveCode1
Benchmarking Data Science AgentsCode1
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language ModelsCode1
Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning AlgorithmsCode1
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language ModelsCode1
A Closer Look at Mortality Risk Prediction from ElectrocardiogramsCode1
MC-Blur: A Comprehensive Benchmark for Image DeblurringCode1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality MetricsCode1
Benchmarking Multidomain English-Indonesian Machine TranslationCode1
EntQA: Entity Linking as Question AnsweringCode1
Show:102550
← PrevPage 37 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified