SOTAVerified

Benchmarking

Papers

Showing 25012525 of 5548 papers

TitleStatusHype
Benchmarking Scientific Image Forgery Detectors0
Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam0
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra0
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning0
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking0
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking0
Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation0
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition0
7th AI Driving Olympics: 1st Place Report for Panoptic Tracking0
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals0
A Theory of Dynamic Benchmarks0
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness0
GeoNet: Benchmarking Unsupervised Adaptation across Geographies0
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management0
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation0
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities0
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models0
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models0
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals0
Geometry-Based Next Frame Prediction from Monocular Video0
Show:102550
← PrevPage 101 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified