SOTAVerified

Benchmarking

Papers

Showing 54765500 of 5548 papers

TitleStatusHype
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical CasesCode0
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus DetectionCode0
ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate DisclosuresCode0
Benchmarking and Rethinking Knowledge Editing for Large Language ModelsCode0
CLEAVE: Scalable and Edge-native Benchmarking of Networked Control SystemsCode0
Quantitative Metrics for Benchmarking Human-Aware Robot NavigationCode0
Benchmarking and optimizing organism wide single-cell RNA alignment methodsCode0
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series ClassificationCode0
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained ModelsCode0
Benchmarking and Improving Text-to-SQL Generation under AmbiguityCode0
Quantum Boosting using Domain-Partitioning HypothesesCode0
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMsCode0
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text GenerationCode0
Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum SimulationsCode0
TDBench: Benchmarking Vision-Language Models in Understanding Top-Down ImagesCode0
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papersCode0
Adversarial Environment Generation for Learning to Navigate the WebCode0
A*3D Dataset: Towards Autonomous Driving in Challenging EnvironmentsCode0
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based ScoringCode0
Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation StrategiesCode0
Quasi-Newton Methods for Machine Learning: Forget the Past, Just SampleCode0
Quaternion Capsule NetworksCode0
QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking ResultsCode0
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMsCode0
Question-Answering Dense Video EventsCode0
Show:102550
← PrevPage 220 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified