SOTAVerified

Benchmarking

Papers

Showing 491500 of 5548 papers

TitleStatusHype
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality RobustnessCode1
GeoBenchX: Benchmarking LLMs for Multistep Geospatial TasksCode1
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model InteractionCode1
QCPINN: Quantum-Classical Physics-Informed Neural Networks for Solving PDEsCode1
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data ContaminationCode1
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal SystemCode1
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric VideosCode1
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific ResearchCode1
GNNs as Predictors of Agentic Workflow PerformancesCode1
Show:102550
← PrevPage 50 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified