Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2501–2525 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Scientific Image Forgery Detectors	May 26, 2021	Benchmarking	—Unverified
Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam	Apr 9, 2021	BenchmarkingScene Text Recognition	—Unverified
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra	Jun 9, 2025	3D ReconstructionBenchmarking	—Unverified
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning	Sep 29, 2021	BenchmarkingImitation Learning	—Unverified
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking	Feb 28, 2023	Adversarial RobustnessBenchmarking	—Unverified
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking	Feb 19, 2025	Benchmarking	—Unverified
Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation	Dec 16, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition	Jan 10, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
7th AI Driving Olympics: 1st Place Report for Panoptic Tracking	Dec 9, 2021	BenchmarkingPanoptic Segmentation	—Unverified
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals	May 30, 2025	BenchmarkingEarth Observation	—Unverified
A Theory of Dynamic Benchmarks	Oct 6, 2022	Benchmarking	—Unverified
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy	Jul 25, 2024	Benchmarking	—Unverified
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games	Aug 28, 2024	Atari GamesBenchmarking	—Unverified
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness	May 5, 2023	BenchmarkingDataset Distillation	—Unverified
GeoNet: Benchmarking Unsupervised Adaptation across Geographies	Mar 27, 2023	BenchmarkingDomain Adaptation	—Unverified
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management	Jun 19, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation	Mar 2, 2022	BenchmarkingDeep Learning	—Unverified
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency	Sep 12, 2019	BenchmarkingGeneral Classification	—Unverified
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval	Jan 15, 2025	BenchmarkingContrastive Learning	—Unverified
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities	Jun 6, 2023	BenchmarkingDepth Completion	—Unverified
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models	Jun 17, 2024	BenchmarkingSurvey	—Unverified
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models	Jun 3, 2023	Benchmarking	—Unverified
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals	May 21, 2025	BenchmarkingChatbot	—Unverified
Geometry-Based Next Frame Prediction from Monocular Video	Sep 20, 2016	Autonomous DrivingBenchmarking	—Unverified

Show:10 25 50

← PrevPage 101 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified