Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4001–4050 of 5548 papers

Title	Date	Tasks	Status
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation	Feb 25, 2025	BenchmarkingSemantic Segmentation	—Unverified
Open foundation models for Azerbaijani language	Jul 2, 2024	Benchmarking	—Unverified
Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop	Jul 14, 2025	Benchmarking	—Unverified
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation	Jul 17, 2017	BenchmarkingPose Estimation	—Unverified
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs	Oct 16, 2024	Benchmarking	—Unverified
Benchmarking and Enhancing Surgical Phase Recognition Models for Robotic-Assisted Esophagectomy	Dec 5, 2024	BenchmarkingDecoder	—Unverified
Open Llama2 Model for the Lithuanian Language	Aug 23, 2024	Benchmarkingmodel	—Unverified
OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning	Sep 11, 2022	BenchmarkingClassification	—Unverified
Treatment Learning Causal Transformer for Noisy Image Classification	Mar 29, 2022	BenchmarkingClassification	—Unverified
Benchmarking and Enhancing Disentanglement in Concept-Residual Models	Nov 30, 2023	BenchmarkingDisentanglement	—Unverified
Benchmarking and Comparing Multi-exposure Image Fusion Algorithms	Jul 30, 2020	BenchmarkingMulti-Exposure Image Fusion	—Unverified
Tree Instance Segmentation With Temporal Contour Graph	Jan 1, 2023	BenchmarkingInstance Segmentation	—Unverified
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images	Jun 11, 2024	BenchmarkingGPU	—Unverified
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest	Dec 20, 2023	BenchmarkingIn-Context Learning	—Unverified
Open-set object detection: towards unified problem formulation and benchmarking	Nov 8, 2024	Autonomous DrivingBenchmarking	—Unverified
OpenSiteRec: An Open Dataset for Site Recommendation	Jul 3, 2023	BenchmarkingInformation Retrieval	—Unverified
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks	Jan 8, 2025	BenchmarkingDeep Learning	—Unverified
Benchmarking and Analyzing Generative Data for Visual Recognition	Jul 25, 2023	BenchmarkingRetrieval	—Unverified
Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design	Mar 27, 2023	BenchmarkingEdge-computing	—Unverified
Benchmarking a (μ+λ) Genetic Algorithm with Configurable Crossover Probability	Jun 10, 2020	Benchmarking	—Unverified
Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation	Jun 6, 2024	BenchmarkingDrug Discovery	—Unverified
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization	Sep 29, 2021	Bayesian OptimizationBenchmarking	—Unverified
Benchmarking Algorithms for Automatic License Plate Recognition	Mar 27, 2022	BenchmarkingLicense Plate Recognition	—Unverified
Scale MLPerf-0.6 models on Google TPU-v3 Pods	Sep 21, 2019	Benchmarking	—Unverified
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation	Aug 10, 2023	AttributeBenchmarking	—Unverified
Opposition based Ensemble Micro Differential Evolution	Sep 8, 2017	BenchmarkingDiversity	—Unverified
Trial-Based Dominance Enables Non-Parametric Tests to Compare both the Speed and Accuracy of Stochastic Optimizers	Dec 19, 2022	BenchmarkingStochastic Optimization	—Unverified
Optimal Eco-driving Control of Autonomous and Electric Trucks in Adaptation to Highway Topography: Energy Minimization and Battery Life Extension	Sep 10, 2020	BenchmarkingModel Predictive Control	—Unverified
Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning	Jan 21, 2025	BenchmarkingContinual Learning	—Unverified
Optimal PMU Placement for Kalman Filtering of DAE Power System Models	Feb 5, 2025	BenchmarkingState Estimation	—Unverified
Optimal Scheduling of Anticipated COVID-19 Vaccination: A Case Study of New York State	Aug 24, 2020	BenchmarkingScheduling	—Unverified
Optimization of Genomic Classifiers for Clinical Deployment: Evaluation of Bayesian Optimization to Select Predictive Models of Acute Infection and In-Hospital Mortality	Mar 27, 2020	Bayesian OptimizationBenchmarking	—Unverified
Optimization Techniques for a Physical Model of Human Vocalisation	Sep 26, 2023	Benchmarking	—Unverified
Optimizing open-domain question answering with graph-based retrieval augmented generation	Mar 4, 2025	BenchmarkingLanguage Modeling	—Unverified
Benchmarking air-conditioning energy performance of residential rooms based on regression and clustering techniques	Aug 22, 2019	BenchmarkingClustering	—Unverified
Optimizing Recommendations using Fine-Tuned LLMs	May 11, 2025	BenchmarkingRecommendation Systems	—Unverified
OPTION: OPTImization Algorithm Benchmarking ONtology	Apr 24, 2021	BenchmarkingData Integration	—Unverified
OPTION: OPTImization Algorithm Benchmarking ONtology	Nov 21, 2022	BenchmarkingData Integration	—Unverified
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol	Mar 7, 2025	BenchmarkingBug fixing	—Unverified
Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring	Nov 27, 2024	BenchmarkingEarth Observation	—Unverified
Trident: Efficient 4PC Framework for Privacy Preserving Machine Learning	Dec 5, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks	Apr 2, 2025	BenchmarkingLanguage Modeling	—Unverified
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images	Jan 25, 2024	BenchmarkingSegmentation	—Unverified
OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery	Oct 25, 2024	Benchmarkingimage-classification	—Unverified
Organ-aware Multi-scale Medical Image Segmentation Using Text Prompt Engineering	Mar 18, 2025	BenchmarkingDescriptive	—Unverified
Benchmarking Aggression Identification in Social Media	Aug 1, 2018	Aggression IdentificationBenchmarking	—Unverified
Orthogonal Deep Features Decomposition for Age-Invariant Face Recognition	Oct 17, 2018	Age-Invariant Face RecognitionBenchmarking	—Unverified
A critical look at the current train/test split in machine learning	Jun 8, 2021	Active LearningBenchmarking	—Unverified
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report	Oct 5, 2023	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 81 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified