Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1025 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica	Sep 6, 2021	Benchmarking	CodeCode Available	1	5
FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks	Nov 22, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Feb 13, 2025	BenchmarkingRetrieval	CodeCode Available	1	5
featsel: A framework for benchmarking of feature selection algorithms and cost functions	Jul 19, 2017	BenchmarkingComputational Efficiency	CodeCode Available	1	5
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things	Sep 29, 2023	BenchmarkingFederated Learning	CodeCode Available	1	5
RADAR: Benchmarking Language Models on Imperfect Tabular Data	Jun 9, 2025	BenchmarkingMissing Values	CodeCode Available	1	5
Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?	Aug 14, 2023	BenchmarkingDrug Design	CodeCode Available	1	5
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization	Nov 15, 2023	BenchmarkingInstruction Following	CodeCode Available	1	5
DomainLab: A modular Python package for domain generalization in deep learning	Mar 21, 2024	BenchmarkingDomain Generalization	CodeCode Available	1	5
Federated Learning Under Intermittent Client Availability and Time-Varying Communication Constraints	May 13, 2022	BenchmarkingFederated Learning	CodeCode Available	1	5
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?	Apr 29, 2024	Answer GenerationBenchmarking	CodeCode Available	1	5
Benchmarking: Past, Present and Future	Aug 1, 2021	BenchmarkingReading Comprehension	CodeCode Available	1	5
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089	Nov 6, 2023	BenchmarkingKnowledge Base Question Answering	CodeCode Available	1	5
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension	Mar 26, 2022	BenchmarkingQuestion Answering	CodeCode Available	1	5
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
A Comparison of Image Denoising Methods	Apr 18, 2023	BenchmarkingDenoising	CodeCode Available	1	5
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation	May 30, 2025	AllBenchmarking	CodeCode Available	1	5
Fast hyperboloid decision tree algorithms	Oct 20, 2023	BenchmarkingRiemannian optimization	CodeCode Available	1	5
AI Agents That Matter	Jul 1, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware	Jul 28, 2023	Benchmarkingreinforcement-learning	CodeCode Available	1	5
AI Accelerator Survey and Trends	Sep 18, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1	5
EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box Functions	Jun 8, 2021	Bayesian OptimisationBenchmarking	CodeCode Available	1	5
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Mar 27, 2025	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking Object Detectors with COCO: A New Path Forward	Mar 27, 2024	BenchmarkingObject	CodeCode Available	1	5

Show:10 25 50

← PrevPage 41 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified