SOTAVerified

Benchmarking

Papers

Showing 391400 of 5548 papers

TitleStatusHype
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation0
SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable ThresholdsCode0
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and ChallengesCode0
Benchmarking CFAR and CNN-based Peak Detection Algorithms in ISAC under Hardware Impairments0
Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale0
ASR-FAIRBENCH: Measuring and Benchmarking Equity Across Speech Recognition Systems0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models0
HumaniBench: A Human-Centric Framework for Large Multimodal Models EvaluationCode0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
Show:102550
← PrevPage 40 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified