Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3576–3600 of 5548 papers

Title	Date	Tasks	Status
On the relationship between Benchmarking, Standards and Certification in Robotics and AI	Sep 21, 2023	Benchmarking	—Unverified
On the Reliability and Validity of Detecting Approval of Political Actors in Tweets	Nov 1, 2020	BenchmarkingSentiment Analysis	—Unverified
On the Robustness of Human-Object Interaction Detection against Distribution Shift	Jun 22, 2025	BenchmarkingData Augmentation	—Unverified
On the role of benchmarking data sets and simulations in method comparison studies	Aug 2, 2022	Benchmarking	—Unverified
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning	Oct 25, 2019	Benchmarking	—Unverified
On the Use of Quality Diversity Algorithms for The Traveling Thief Problem	Dec 16, 2021	BenchmarkingDiversity	—Unverified
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds	Jan 1, 2025	Benchmarking	—Unverified
On the Value of ML Models	Dec 13, 2021	Benchmarking	—Unverified
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images	Apr 17, 2023	3D Pose EstimationBenchmarking	—Unverified
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations	Dec 3, 2024	BenchmarkingFace Recognition	—Unverified
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Open-CD: A Comprehensive Toolbox for Change Detection	Jul 22, 2024	BenchmarkingChange Detection	—Unverified
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI	Apr 4, 2023	Benchmarking	—Unverified
Open Datasets for Satellite Radio Resource Control	Apr 22, 2024	BenchmarkingDecision Making	—Unverified
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation	Apr 18, 2025	Benchmarking	—Unverified
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion	Jan 16, 2024	Benchmarking	—Unverified
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety	Mar 18, 2024	BenchmarkingMathematical Reasoning	—Unverified
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation	Feb 25, 2025	BenchmarkingSemantic Segmentation	—Unverified
Open foundation models for Azerbaijani language	Jul 2, 2024	Benchmarking	—Unverified
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs	Oct 16, 2024	Benchmarking	—Unverified
Open Llama2 Model for the Lithuanian Language	Aug 23, 2024	Benchmarkingmodel	—Unverified
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning	Sep 11, 2022	BenchmarkingClassification	—Unverified
Open-set object detection: towards unified problem formulation and benchmarking	Nov 8, 2024	Autonomous DrivingBenchmarking	—Unverified
OpenSiteRec: An Open Dataset for Site Recommendation	Jul 3, 2023	BenchmarkingInformation Retrieval	—Unverified
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks	Jan 8, 2025	BenchmarkingDeep Learning	—Unverified

Show:10 25 50

← PrevPage 144 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified