Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4326–4350 of 5548 papers

Title	Date	Tasks	Status
Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease	Sep 21, 2017	BenchmarkingClassification	—Unverified
Understanding the Limits of Lifelong Knowledge Editing in LLMs	Mar 7, 2025	Benchmarkingknowledge editing	—Unverified
Who Wins the Game of Thrones? How Sentiments Improve the Prediction of Candidate Choice	Feb 29, 2020	BenchmarkingHoldout Set	—Unverified
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective	Jun 19, 2024	BenchmarkingContinual Pretraining	—Unverified
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture	Apr 21, 2025	Benchmarkingclass-incremental learning	—Unverified
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain	Oct 31, 2023	BenchmarkingDiagnostic	—Unverified
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models	Jun 3, 2024	BenchmarkingCode Completion	—Unverified
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests	May 23, 2023	BenchmarkingLanguage Modeling	—Unverified
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation	May 29, 2025	BenchmarkingImage Generation	—Unverified
R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery	Jul 12, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking	Sep 29, 2021	BenchmarkingMulti-Task Learning	—Unverified
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR	Nov 23, 2021	BenchmarkingComputed Tomography (CT)	—Unverified
A tutorial on multi-view autoencoders using the multi-view-AE library	Mar 12, 2024	Benchmarking	—Unverified
Understanding the User: An Intent-Based Ranking Dataset	Aug 30, 2024	BenchmarkingInformation Retrieval	—Unverified
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems	Jun 25, 2024	BenchmarkingRAG	—Unverified
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking	Jan 8, 2024	BenchmarkingContrastive Learning	—Unverified
A Theory of Dynamic Benchmarks	Oct 6, 2022	Benchmarking	—Unverified
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Jan 22, 2025	BenchmarkingHallucination	—Unverified
Rail-5k: a Real-World Dataset for Rail Surface Defects Detection	Jun 28, 2021	4kBenchmarking	—Unverified
On the Evaluation of Engineering Artificial General Intelligence	May 15, 2025	Benchmarking	—Unverified
A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality	Apr 5, 2022	BenchmarkingSelf-Supervised Learning	—Unverified
RAN-GNNs: breaking the capacity limits of graph neural networks	Mar 29, 2021	AttributeBenchmarking	—Unverified
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified
A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes	Apr 7, 2024	Benchmarking	—Unverified
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games	Aug 28, 2024	Atari GamesBenchmarking	—Unverified

Show:10 25 50

← PrevPage 174 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified