Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–750 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information	Jan 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1	5
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware	Jul 28, 2023	Benchmarkingreinforcement-learning	CodeCode Available	1	5
Benchmarking emergency department triage prediction models with machine learning and large public electronic health records	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
Automatic sleep stage classification with deep residual networks in a mixed-cohort setting	Aug 21, 2020	Automatic Sleep Stage ClassificationBenchmarking	CodeCode Available	1	5
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning	Dec 11, 2023	BenchmarkingHuman-Object Interaction Detection	CodeCode Available	1	5
Benchmarking: Past, Present and Future	Aug 1, 2021	BenchmarkingReading Comprehension	CodeCode Available	1	5
Benchmarking Omni-Vision Representation through the Lens of Visual Realms	Jul 14, 2022	BenchmarkingContrastive Learning	CodeCode Available	1	5
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models	May 20, 2025	BenchmarkingDiagnostic	CodeCode Available	1	5
Autonomous Microscopy Experiments through Large Language Model Agents	Dec 18, 2024	BenchmarkingExperimental Design	CodeCode Available	1	5
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking	Oct 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations	Oct 17, 2023	BenchmarkingEmotion Recognition	CodeCode Available	1	5
Emoji Prediction: Extensions and Benchmarking	Jul 14, 2020	BenchmarkingMulti-Label Classification	CodeCode Available	1	5
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Aug 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
A Ladder of Causal Distances	May 5, 2020	BenchmarkingCausal Discovery	CodeCode Available	1	5
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging	Apr 30, 2024	BenchmarkingImage Reconstruction	CodeCode Available	1	5
Atom-Level Optical Chemical Structure Recognition with Limited Supervision	Apr 2, 2024	Benchmarking	CodeCode Available	1	5
End-to-end Emotion-Cause Pair Extraction via Learning to Link	Feb 25, 2020	BenchmarkingEmotion Cause Extraction	CodeCode Available	1	5
DFGC 2022: The Second DeepFake Game Competition	Jun 30, 2022	BenchmarkingFace Swapping	CodeCode Available	1	5
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones	Nov 5, 2023	Benchmarking	CodeCode Available	1	5
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models	Jun 2, 2023	BenchmarkingLanguage Acquisition	CodeCode Available	1	5
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective	Oct 8, 2024	AttributeBenchmarking	CodeCode Available	1	5
dMelodies: A Music Dataset for Disentanglement Learning	Jul 29, 2020	BenchmarkingDisentanglement	CodeCode Available	1	5
Benchmarking Quantized Neural Networks on FPGAs with FINN	Feb 2, 2021	BenchmarkingQuantization	CodeCode Available	1	5
Detecting beats in the photoplethysmogram: benchmarking open-source algorithms	Jul 19, 2022	BenchmarkingPhotoplethysmography (PPG) beat detection	CodeCode Available	1	5
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1	5
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios	Oct 31, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1	5
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers	Jul 3, 2020	BenchmarkingDeep Learning	CodeCode Available	1	5
Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers	Jan 1, 2021	BenchmarkingDeep Learning	CodeCode Available	1	5
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1	5
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering	Aug 31, 2023	BenchmarkingDataset Generation	CodeCode Available	1	5
Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining	Nov 22, 2017	Benchmarkingfeature selection	CodeCode Available	1	5
Evaluating Attribution for Graph Neural Networks	Dec 1, 2020	Benchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions	Jan 1, 2024	BenchmarkingInstruction Following	CodeCode Available	1	5
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain	Jun 1, 2022	BenchmarkingEmotion Recognition	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
Geometric Deep Learning for Structure-Based Drug Design: A Survey	Jun 20, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial Attacks	Dec 20, 2022	3D Object DetectionBenchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Feb 28, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
Benchmarking Robustness of 3D Object Detection to Common Corruptions	Jan 1, 2023	3D Object DetectionAutonomous Driving	CodeCode Available	1	5
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects	May 9, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs	Nov 5, 2022	AttributeBenchmarking	CodeCode Available	1	5
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis	Aug 12, 2021	BenchmarkingMedical Image Analysis	CodeCode Available	1	5
Benchmarking saliency methods for chest X-ray interpretation	Oct 10, 2022	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Robustness to Adversarial Image Obfuscations	Jan 30, 2023	Benchmarking	CodeCode Available	1	5
Beacon, a lightweight deep reinforcement learning benchmark library for flow control	Feb 27, 2024	BenchmarkingCPU	CodeCode Available	1	5
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging	Apr 22, 2024	Benchmarking	CodeCode Available	1	5
Explainable Benchmarking for Iterative Optimization Heuristics	Jan 31, 2024	BenchmarkingEvolutionary Algorithms	CodeCode Available	1	5
Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency	Jun 14, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1	5

Show:10 25 50

← PrevPage 15 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified