Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1250 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding	Nov 6, 2023	BenchmarkingData Compression	CodeCode Available	1	5
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels	Jan 30, 2024	Benchmarkingimage-classification	CodeCode Available	1	5
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery	Mar 24, 2025	BenchmarkingHumanitarian	CodeCode Available	1	5
Benchmarking Object Detectors with COCO: A New Path Forward	Mar 27, 2024	BenchmarkingObject	CodeCode Available	1	5
Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs	Sep 18, 2021	BenchmarkingComplex Query Answering	CodeCode Available	1	5
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms	Aug 2, 2021	Benchmarkingcounterfactual	CodeCode Available	1	5
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1	5
Benchmarking the Generation of Fact Checking Explanations	Aug 29, 2023	Abstractive Text SummarizationArticles	CodeCode Available	1	5
CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery	Oct 3, 2023	BenchmarkingCausal Discovery	CodeCode Available	1	5
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework	Jun 12, 2024	BenchmarkingCausal Inference	CodeCode Available	1	5
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
CBench: Towards Better Evaluation of Question Answering Over Knowledge Graphs	Apr 5, 2021	BenchmarkingKnowledge Graphs	CodeCode Available	1	5
How to Train Neural Field Representations: A Comprehensive Study and Benchmark	Dec 16, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification	Jul 6, 2023	BenchmarkingDomain Adaptation	CodeCode Available	1	5
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond	Jun 16, 2023	BenchmarkingEvidence Selection	CodeCode Available	1	5
3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding	Oct 16, 2023	Action RecognitionBenchmarking	CodeCode Available	1	5
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims	Feb 17, 2025	BenchmarkingFact Checking	CodeCode Available	1	5
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing	Jun 7, 2023	BenchmarkingPrompt Engineering	CodeCode Available	1	5
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
LEMUR Neural Network Dataset: Towards Seamless AutoML	Apr 14, 2025	AutoMLBenchmarking	CodeCode Available	1	5
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics	Aug 2, 2024	Adversarial AttackAdversarial Purification	CodeCode Available	1	5
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1	5
Benchmarking Omni-Vision Representation through the Lens of Visual Realms	Jul 14, 2022	BenchmarkingContrastive Learning	CodeCode Available	1	5
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1	5
HINT3: Raising the bar for Intent Detection in the Wild	Sep 29, 2020	BenchmarkingIntent Detection	CodeCode Available	1	5
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1	5
Histo-Genomic Knowledge Distillation For Cancer Prognosis From Histopathology Whole Slide Images	Mar 15, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	1	5
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1	5
Hierarchical graph neural nets can capture long-range interactions	Jul 15, 2021	BenchmarkingMolecular Property Prediction	CodeCode Available	1	5
Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection	Jun 28, 2023	BenchmarkingData Augmentation	CodeCode Available	1	5
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless Systems	May 11, 2021	BenchmarkingEdge-computing	CodeCode Available	1	5
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Dec 12, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1	5
Benchmarking Language Models for Code Syntax Understanding	Oct 26, 2022	Benchmarking	CodeCode Available	1	5
LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild	May 30, 2024	Benchmarking	CodeCode Available	1	5
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction	Nov 16, 2023	BenchmarkingEvent Extraction	CodeCode Available	1	5
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Nov 26, 2024	BenchmarkingText-to-Video Generation	CodeCode Available	1	5
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models	Nov 29, 2021	BenchmarkingPhysical Simulations	CodeCode Available	1	5
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care	Sep 16, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis	Feb 13, 2021	BenchmarkingClustering	CodeCode Available	1	5
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1	5
Clinical Prompt Learning with Frozen Language Models	May 11, 2022	BenchmarkingGPU	CodeCode Available	1	5
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters	Jul 5, 2024	Benchmarkingvalid	CodeCode Available	1	5
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing	Sep 25, 2024	BenchmarkingImage Dehazing	CodeCode Available	1	5
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning	Jul 22, 2024	BenchmarkingHallucination	CodeCode Available	1	5
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency	Jun 14, 2024	Benchmarking	CodeCode Available	1	5
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns	Jan 28, 2025	Adversarial AttackBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 25 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified