Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1050 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation	Mar 7, 2024	BenchmarkingMultimodal Recommendation	CodeCode Available	1	5
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering	May 22, 2025	BenchmarkingEvidence Selection	CodeCode Available	1	5
Benchmarking Robustness of 3D Object Detection to Common Corruptions	Jan 1, 2023	3D Object DetectionAutonomous Driving	CodeCode Available	1	5
A Comparison of Image Denoising Methods	Apr 18, 2023	BenchmarkingDenoising	CodeCode Available	1	5
Formalizing Multimedia Recommendation through Multimodal Deep Learning	Sep 11, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
Continual Learning with Foundation Models: An Empirical Study of Latent Replay	Apr 30, 2022	BenchmarkingContinual Learning	CodeCode Available	1	5
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph	May 23, 2025	BenchmarkingManagement	CodeCode Available	1	5
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1	5
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks	Apr 5, 2022	Benchmarking	CodeCode Available	1	5
AI Agents That Matter	Jul 1, 2024	Benchmarking	CodeCode Available	1	5
Earnings-22: A Practical Benchmark for Accents in the Wild	Mar 29, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
FNBench: Benchmarking Robust Federated Learning against Noisy Labels	May 10, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1	5
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation	Oct 10, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1	5
EBES: Easy Benchmarking for Event Sequences	Oct 4, 2024	Benchmarking	CodeCode Available	1	5
AI Accelerator Survey and Trends	Sep 18, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1	5
FM-TS: Flow Matching for Time Series Generation	Nov 12, 2024	BenchmarkingImputation	CodeCode Available	1	5
FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding	Sep 28, 2023	BenchmarkingImage Retrieval	CodeCode Available	1	5
EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset	Oct 11, 2021	BenchmarkingFace Hallucination	CodeCode Available	1	5
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios	May 22, 2025	Benchmarking	CodeCode Available	1	5
Flames: Benchmarking Value Alignment of LLMs in Chinese	Nov 12, 2023	BenchmarkingFairness	CodeCode Available	1	5
Benchmarking Quantized Neural Networks on FPGAs with FINN	Feb 2, 2021	BenchmarkingQuantization	CodeCode Available	1	5
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining	Nov 22, 2017	Benchmarkingfeature selection	CodeCode Available	1	5
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses	Mar 3, 2025	Benchmarking	CodeCode Available	1	5
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation	May 27, 2025	BenchmarkingDecision Making	CodeCode Available	1	5
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis	Mar 9, 2021	BenchmarkingClassification	CodeCode Available	1	5
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation	Feb 10, 2025	Benchmarking	CodeCode Available	1	5
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchmarking Visual Localization for Autonomous Navigation	Mar 24, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1	5
FiFAR: A Fraud Detection Dataset for Learning to Defer	Dec 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking	Jun 15, 2024	BenchmarkingGPU	CodeCode Available	1	5
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging	Jun 6, 2025	Benchmarking	CodeCode Available	1	5
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization	Aug 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding	Sep 27, 2021	BenchmarkingNatural Language Understanding	CodeCode Available	1	5
Benchmarking emergency department triage prediction models with machine learning and large public electronic health records	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Pathology Feature Extractors for Whole Slide Image Classification	Nov 20, 2023	Benchmarkingimage-classification	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods	Jun 15, 2023	BenchmarkingFairness	CodeCode Available	1	5
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1	5
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction	Jul 25, 2024	BenchmarkingDeep Learning	CodeCode Available	1	5
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging	Apr 26, 2020	BenchmarkingLeft Atrium Segmentation	CodeCode Available	1	5
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models	Mar 31, 2023	BenchmarkingCausal Discovery	CodeCode Available	1	5
A global analysis of metrics used for measuring performance in natural language processing	Apr 25, 2022	BenchmarkingMachine Translation	CodeCode Available	1	5
Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces	May 23, 2023	Benchmarking	CodeCode Available	1	5
FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data	Mar 7, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking: Past, Present and Future	Aug 1, 2021	BenchmarkingReading Comprehension	CodeCode Available	1	5
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks	Nov 22, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images	Oct 25, 2022	BenchmarkingFew-Shot Object Detection	CodeCode Available	1	5
ArtFID: Quantitative Evaluation of Neural Style Transfer	Jul 25, 2022	BenchmarkingMeta-Learning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 21 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified