Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 951–1000 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Differential Privacy and Federated Learning for BERT Models	Jun 26, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
Accelerated and interpretable oblique random survival forests	Aug 1, 2022	BenchmarkingComputational Efficiency	CodeCode Available	1	5
Decoding the Underlying Meaning of Multimodal Hateful Memes	May 28, 2023	BenchmarkingHateful Meme Classification	CodeCode Available	1	5
Benchmarking Distribution Shift in Tabular Data with TableShift	Dec 10, 2023	BenchmarkingBinary Classification	CodeCode Available	1	5
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed	May 27, 2022	BenchmarkingBinary Classification	CodeCode Available	1	5
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Mar 27, 2025	AttributeBenchmarking	CodeCode Available	1	5
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics	Aug 2, 2024	Adversarial AttackAdversarial Purification	CodeCode Available	1	5
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark	Jun 5, 2025	Benchmarking	CodeCode Available	1	5
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension	Mar 26, 2022	BenchmarkingQuestion Answering	CodeCode Available	1	5
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models	May 20, 2025	BenchmarkingDiagnostic	CodeCode Available	1	5
Monash University, UEA, UCR Time Series Extrinsic Regression Archive	Jun 19, 2020	BenchmarkingMissing Values	CodeCode Available	1	5
Benchmarking Econometric and Machine Learning Methodologies in Nowcasting	May 6, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
Benchmarking Robustness of 3D Object Detection to Common Corruptions	Jan 1, 2023	3D Object DetectionAutonomous Driving	CodeCode Available	1	5
Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis	Sep 30, 2024	BenchmarkingIntrusion Detection	CodeCode Available	1	5
Mukayese: Turkish NLP Strikes Back	Mar 2, 2022	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective	Jul 10, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking Omni-Vision Representation through the Lens of Visual Realms	Jul 14, 2022	BenchmarkingContrastive Learning	CodeCode Available	1	5
3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding	Oct 16, 2023	Action RecognitionBenchmarking	CodeCode Available	1	5
Benchmarking: Past, Present and Future	Aug 1, 2021	BenchmarkingReading Comprehension	CodeCode Available	1	5
EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box Functions	Jun 8, 2021	Bayesian OptimisationBenchmarking	CodeCode Available	1	5
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
FedScale: Benchmarking Model and System Performance of Federated Learning at Scale	May 24, 2021	BenchmarkingFederated Learning	CodeCode Available	1	5
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4	Mar 20, 2023	BenchmarkingDe-identification	CodeCode Available	1	5
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Object Detectors with COCO: A New Path Forward	Mar 27, 2024	BenchmarkingObject	CodeCode Available	1	5
Delving into Out-of-Distribution Detection with Medical Vision-Language Models	Mar 2, 2025	Benchmarkingimage-classification	CodeCode Available	1	5
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks	Aug 18, 2019	BenchmarkingImage Classification	CodeCode Available	1	5
DependEval: Benchmarking LLMs for Repository Dependency Understanding	Mar 9, 2025	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware	Jul 28, 2023	Benchmarkingreinforcement-learning	CodeCode Available	1	5
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery	Mar 24, 2025	BenchmarkingHumanitarian	CodeCode Available	1	5
Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers	Jan 1, 2021	BenchmarkingDeep Learning	CodeCode Available	1	5
Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition	May 18, 2021	Action RecognitionAction Recognition In Videos	CodeCode Available	1	5
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging	Apr 22, 2024	Benchmarking	CodeCode Available	1	5
Detecting beats in the photoplethysmogram: benchmarking open-source algorithms	Jul 19, 2022	BenchmarkingPhotoplethysmography (PPG) beat detection	CodeCode Available	1	5
MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery	Feb 18, 2022	BenchmarkingRepresentation Learning	CodeCode Available	1	5
Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments	Apr 27, 2024	Autonomous VehiclesBenchmarking	CodeCode Available	1	5
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs	Feb 23, 2024	Benchmarkingslot-filling	CodeCode Available	1	5
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering	Aug 31, 2023	BenchmarkingDataset Generation	CodeCode Available	1	5
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs	Feb 21, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding	May 21, 2024	BenchmarkingKeypoint Detection	CodeCode Available	1	5
Explainable Benchmarking for Iterative Optimization Heuristics	Jan 31, 2024	BenchmarkingEvolutionary Algorithms	CodeCode Available	1	5
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations	Oct 17, 2023	BenchmarkingEmotion Recognition	CodeCode Available	1	5
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks	Oct 12, 2021	Benchmarkingimage-classification	CodeCode Available	1	5
NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search	Jun 18, 2022	BenchmarkingGraph Neural Network	CodeCode Available	1	5
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1	5
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT	Jul 9, 2021	BenchmarkingDocument Classification	CodeCode Available	1	5
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Aug 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information	Jan 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5
Protein Structure Tokenization: Benchmarking and New Recipe	Feb 28, 2025	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Neural Network Generalization for Grammar Induction	Aug 16, 2023	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 20 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified