SOTAVerified

Benchmarking

Papers

Showing 21912200 of 5548 papers

TitleStatusHype
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation0
MULTITAT: Benchmarking Multilingual Table-and-Text Question AnsweringCode0
SynthRAD2025 Grand Challenge dataset: generating synthetic CTs for radiotherapy0
Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement0
Benchmarking Temporal Reasoning and Alignment Across Chinese DynastiesCode0
Overconfident Oracles: Limitations of In Silico Sequence Design Benchmarking0
On Neural Inertial Classification Networks for Pedestrian Activity Recognition0
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data ScienceCode0
VidLBEval: Benchmarking and Mitigating Language Bias in Video-Involved LVLMs0
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language ModelsCode0
Show:102550
← PrevPage 220 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified