Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3926–3950 of 5548 papers

Title	Date	Tasks	Status
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities	Dec 9, 2024	AllBenchmarking	—Unverified
One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision	Feb 3, 2021	BenchmarkingFairness	—Unverified
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified
One of these (Few) Things is Not Like the Others	May 22, 2020	BenchmarkingFew-Shot Learning	—Unverified
Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios	Apr 16, 2025	Audio Deepfake DetectionBenchmarking	—Unverified
One-Shot Federated Learning with Classifier-Free Diffusion Models	Feb 12, 2025	BenchmarkingDataset Generation	—Unverified
On Evaluation of Bangla Word Analogies	Apr 10, 2023	BenchmarkingWord Embeddings	—Unverified
On Evaluation of Document Classification using RVL-CDIP	Jun 21, 2023	BenchmarkingClassification	—Unverified
Benchmarking Attention Mechanisms and Consistency Regularization Semi-Supervised Learning for Post-Flood Building Damage Assessment in Satellite Images	Dec 4, 2024	BenchmarkingBuilding Damage Assessment	—Unverified
On General Language Understanding	Oct 27, 2023	BenchmarkingEthics	—Unverified
Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis	Jul 1, 2021	Benchmarking	—Unverified
Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions	Aug 7, 2024	Anomaly DetectionBenchmarking	—Unverified
Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots	Sep 12, 2024	BenchmarkingChatbot	—Unverified
On loss functions and evaluation metrics for music source separation	Feb 16, 2022	Audio Source SeparationBenchmarking	—Unverified
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling	Jul 19, 2019	BenchmarkingMotion Estimation	—Unverified
On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction	Jul 15, 2024	Active LearningBenchmarking	—Unverified
An Approach to Evaluate Modeling Adequacy for Small-Signal Stability Analysis of IBR-related SSOs in Multimachine Systems	Mar 12, 2024	Benchmarking	—Unverified
On Neural Inertial Classification Networks for Pedestrian Activity Recognition	Feb 23, 2025	Activity RecognitionBenchmarking	—Unverified
Zero-Forcing Max-Power Beamforming for Hybrid mmWave Full-Duplex MIMO Systems	Feb 29, 2020	Benchmarking	—Unverified
LAraBench: Benchmarking Arabic AI with Large Language Models	May 24, 2023	BenchmarkingFew-Shot Learning	—Unverified
On quantifying and improving realism of images generated with diffusion	Sep 26, 2023	AttributeBenchmarking	—Unverified
Active Evaluation Acquisition for Efficient LLM Benchmarking	Oct 8, 2024	Benchmarking	—Unverified
On Symbiosis of Attribute Prediction and Semantic Segmentation	Nov 23, 2019	AttributeBenchmarking	—Unverified
On the Assessment of Benchmark Suites for Algorithm Comparison	Apr 15, 2021	Benchmarking	—Unverified
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation	Jul 4, 2024	BenchmarkingChatbot	—Unverified

Show:10 25 50

← PrevPage 158 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified