SOTAVerified

Benchmarking

Papers

Showing 14211430 of 5548 papers

TitleStatusHype
Autonomous Microscopy Experiments through Large Language Model AgentsCode1
EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearnerCode1
IOHanalyzer: Detailed Performance Analyses for Iterative Optimization HeuristicsCode1
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly DetectionCode1
Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning InterpretabilityCode1
Enhancing Biomedical Relation Extraction with DirectionalityCode1
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal SystemCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning AlgorithmsCode1
BEND: Benchmarking DNA Language Models on biologically meaningful tasksCode1
Show:102550
← PrevPage 143 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified