Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1776–1800 of 5548 papers

Title	Date	Tasks	Status
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools	Jan 1, 2025	Benchmarking	—Unverified
Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis	Jul 1, 2021	Benchmarking	—Unverified
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices	Jan 1, 2025	Benchmarking	—Unverified
LAraBench: Benchmarking Arabic AI with Large Language Models	May 24, 2023	BenchmarkingFew-Shot Learning	—Unverified
Cognitive Model Priors for Predicting Human Decisions	May 22, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
Coherent Feed Forward Quantum Neural Network	Feb 1, 2024	BenchmarkingDiagnostic	—Unverified
Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks	Apr 30, 2020	BenchmarkingCoherence Evaluation	—Unverified
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors	Dec 15, 2023	BenchmarkingClassification	—Unverified
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition	Oct 16, 2023	BenchmarkingMicro Expression Recognition	—Unverified
Diverse Community Data for Benchmarking Data Privacy Algorithms	Jun 20, 2023	Benchmarking	—Unverified
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models	May 18, 2025	ArticlesBenchmarking	—Unverified
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction	Nov 3, 2023	BenchmarkingSentence	—Unverified
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration	Jun 17, 2022	BenchmarkingDepth Estimation	—Unverified
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance	Aug 4, 2024	Action AnticipationBenchmarking	—Unverified
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task	Apr 27, 2023	ArticlesBenchmarking	—Unverified
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics	Apr 21, 2022	AttributeBenchmarking	—Unverified
Distribution-Based Invariant Deep Networks for Learning Meta-Features	Jun 24, 2020	BenchmarkingGeneral Classification	—Unverified
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories	Nov 7, 2022	3D Reconstruction4D reconstruction	—Unverified
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics	Sep 17, 2021	AttributeBenchmarking	—Unverified
ChatGPT Alternative Solutions: Large Language Models Survey	Mar 21, 2024	BenchmarkingChatbot	—Unverified
Commute Graph Neural Networks	Jun 30, 2024	Benchmarking	—Unverified
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets	Dec 2, 2023	Benchmarking	—Unverified
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts	May 23, 2025	Benchmarking	—Unverified
Distributed Training Large-Scale Deep Architectures	Aug 10, 2017	BenchmarkingDeep Learning	—Unverified
Sensitivity analysis and experimental evaluation of PID-like continuous sliding mode control	Aug 13, 2022	BenchmarkingSensitivity	—Unverified

Show:10 25 50

← PrevPage 72 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified