Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3901–3950 of 5548 papers

Title	Date	Tasks	Status
Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors	Nov 21, 2023	Benchmarking	—Unverified
Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks	Nov 23, 2022	BenchmarkingDeep Learning	—Unverified
Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection	Jul 28, 2024	BenchmarkingFake News Detection	—Unverified
Off-policy Evaluation for Payments at Adyen	Jan 15, 2025	BenchmarkingDecision Making	—Unverified
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation	Jul 11, 2023	BenchmarkingCausal Discovery	—Unverified
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications	May 20, 2025	BenchmarkingMachine Translation	—Unverified
OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics	Jun 12, 2025	Benchmarking	—Unverified
IBB Traffic Graph Data: Benchmarking and Road Traffic Prediction Model	Aug 2, 2024	BenchmarkingFeature Engineering	—Unverified
Benchmarking Azerbaijani Neural Machine Translation	Jul 29, 2022	BenchmarkingDomain Generalization	—Unverified
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver	Nov 20, 2024	Benchmarking	—Unverified
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking	Jun 6, 2024	6D Pose Estimation using RGBBenchmarking	—Unverified
Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims	Jul 22, 2021	AutoMLBenchmarking	—Unverified
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics	Sep 25, 2024	Benchmarking	—Unverified
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics	Feb 18, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions	Dec 9, 2024	BenchmarkingLanguage Modeling	—Unverified
Benchmarking Automated Review Response Generation for the Hospitality Domain	Dec 1, 2020	BenchmarkingDomain Adaptation	—Unverified
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications	Apr 28, 2023	AutoMLBenchmarking	—Unverified
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB	Oct 9, 2024	BenchmarkingDiversity	—Unverified
On Benchmarking Code LLMs for Android Malware Analysis	Apr 1, 2025	BenchmarkingMalware Analysis	—Unverified
On Benchmarking Iris Recognition within a Head-mounted Display for AR/VR Application	Oct 20, 2020	BenchmarkingIris Recognition	—Unverified
On Continual Model Refinement in Out-of-Distribution Data Streams	May 4, 2022	BenchmarkingContinual Learning	—Unverified
Active Learning for Community Detection in Stochastic Block Models	May 8, 2016	Active LearningBenchmarking	—Unverified
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events	Dec 9, 2024	BenchmarkingComputational Efficiency	—Unverified
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos	Jan 1, 2024	Benchmarking	—Unverified
On Distribution Grid Optimal Power Flow Development and Integration	Dec 9, 2022	Benchmarking	—Unverified
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities	Dec 9, 2024	AllBenchmarking	—Unverified
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision	Feb 3, 2021	BenchmarkingFairness	—Unverified
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified
One of these (Few) Things is Not Like the Others	May 22, 2020	BenchmarkingFew-Shot Learning	—Unverified
Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios	Apr 16, 2025	Audio Deepfake DetectionBenchmarking	—Unverified
One-Shot Federated Learning with Classifier-Free Diffusion Models	Feb 12, 2025	BenchmarkingDataset Generation	—Unverified
On Evaluation of Bangla Word Analogies	Apr 10, 2023	BenchmarkingWord Embeddings	—Unverified
On Evaluation of Document Classification using RVL-CDIP	Jun 21, 2023	BenchmarkingClassification	—Unverified
Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images	Dec 4, 2024	BenchmarkingBuilding Damage Assessment	—Unverified
On General Language Understanding	Oct 27, 2023	BenchmarkingEthics	—Unverified
Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis	Jul 1, 2021	Benchmarking	—Unverified
Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions	Aug 7, 2024	Anomaly DetectionBenchmarking	—Unverified
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots	Sep 12, 2024	BenchmarkingChatbot	—Unverified
On loss functions and evaluation metrics for music source separation	Feb 16, 2022	Audio Source SeparationBenchmarking	—Unverified
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling	Jul 19, 2019	BenchmarkingMotion Estimation	—Unverified
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction	Jul 15, 2024	Active LearningBenchmarking	—Unverified
An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems	Mar 12, 2024	Benchmarking	—Unverified
On Neural Inertial Classification Networks for Pedestrian Activity Recognition	Feb 23, 2025	Activity RecognitionBenchmarking	—Unverified
Zero-Forcing Max-Power Beamforming for Hybrid mmWave Full-Duplex MIMO Systems	Feb 29, 2020	Benchmarking	—Unverified
LAraBench: Benchmarking Arabic AI with Large Language Models	May 24, 2023	BenchmarkingFew-Shot Learning	—Unverified
On quantifying and improving realism of images generated with diffusion	Sep 26, 2023	AttributeBenchmarking	—Unverified
Active Evaluation Acquisition for Efficient LLM Benchmarking	Oct 8, 2024	Benchmarking	—Unverified
On Symbiosis of Attribute Prediction and Semantic Segmentation	Nov 23, 2019	AttributeBenchmarking	—Unverified
On the Assessment of Benchmark Suites for Algorithm Comparison	Apr 15, 2021	Benchmarking	—Unverified
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation	Jul 4, 2024	BenchmarkingChatbot	—Unverified

Show:10 25 50

← PrevPage 79 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified