SOTAVerified

Benchmarking

Papers

Showing 701710 of 5548 papers

TitleStatusHype
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy0
Writing as a testbed for open ended agents0
The Coralscapes Dataset: Semantic Scene Understanding in Coral ReefsCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch SchedulingCode0
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming LanguagesCode0
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages0
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition0
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality RobustnessCode1
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation0
Show:102550
← PrevPage 71 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified