Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information	Jan 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects	May 9, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Detecting beats in the photoplethysmogram: benchmarking open-source algorithms	Jul 19, 2022	BenchmarkingPhotoplethysmography (PPG) beat detection	CodeCode Available	1	5
DFGC 2021: A DeepFake Game Competition	Jun 2, 2021	BenchmarkingDeepFake Detection	CodeCode Available	1	5
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Aug 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1	5
A Computed Tomography Vertebral Segmentation Dataset with Anatomical Variations and Multi-Vendor Scanner Data	Mar 10, 2021	AnatomyBenchmarking	CodeCode Available	1	5
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers	Jul 3, 2020	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1	5
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards	May 7, 2025	BenchmarkingHallucination	CodeCode Available	1	5
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1	5
Descending through a Crowded Valley — Benchmarking Deep Learning Optimizers	Jan 1, 2021	BenchmarkingDeep Learning	CodeCode Available	1	5
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones	Nov 5, 2023	Benchmarking	CodeCode Available	1	5
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?	Apr 29, 2024	Answer GenerationBenchmarking	CodeCode Available	1	5
Benchmarking Language Models for Code Syntax Understanding	Oct 26, 2022	Benchmarking	CodeCode Available	1	5
AudioMarkBench: Benchmarking Robustness of Audio Watermarking	Jun 11, 2024	Benchmarkingtext-to-speech	CodeCode Available	1	5
Delving into Out-of-Distribution Detection with Medical Vision-Language Models	Mar 2, 2025	Benchmarkingimage-classification	CodeCode Available	1	5
RobFR: Benchmarking Adversarial Robustness on Face Recognition	Jul 8, 2020	Adversarial RobustnessBenchmarking	CodeCode Available	1	5
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1	5
A Large-Scale Dataset for Benchmarking Elevator Button Segmentation and Character Recognition	Mar 16, 2021	BenchmarkingPosition	CodeCode Available	1	5
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions	Jan 1, 2024	BenchmarkingInstruction Following	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4	Mar 20, 2023	BenchmarkingDe-identification	CodeCode Available	1	5
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation	Feb 18, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection	Mar 5, 2022	BenchmarkingCopy Detection	CodeCode Available	1	5
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Feb 28, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games	Aug 25, 2021	BenchmarkingVideo Classification	CodeCode Available	1	5
Benchmarking Language Model Creativity: A Case Study on Code Generation	Jul 12, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Dec 11, 2024	AttributeBenchmarking	CodeCode Available	1	5
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models	Apr 22, 2024	BenchmarkingWorld Knowledge	CodeCode Available	1	5
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios	Oct 31, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1	5
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering	Aug 31, 2023	BenchmarkingDataset Generation	CodeCode Available	1	5
Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking	Feb 19, 2021	BenchmarkingOpenAI Gym	CodeCode Available	1	5
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations	Apr 15, 2024	BenchmarkingBias Detection	CodeCode Available	1	5
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models	May 20, 2025	BenchmarkingDiagnostic	CodeCode Available	1	5
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models	Jun 24, 2024	BenchmarkingData Augmentation	CodeCode Available	1	5
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks	Aug 18, 2019	BenchmarkingImage Classification	CodeCode Available	1	5
Attention, Please! Revisiting Attentive Probing for Masked Image Modeling	Jun 11, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1	5
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM	Mar 28, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Meta-embeddings: What Works and What Does Not	Nov 1, 2021	BenchmarkingEmbeddings Evaluation	CodeCode Available	1	5
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1	5
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects	Oct 3, 2024	BenchmarkingImitation Learning	CodeCode Available	1	5
Align and Distill: Unifying and Improving Domain Adaptive Object Detection	Mar 18, 2024	Benchmarkingobject-detection	CodeCode Available	1	5
Deep learning model solves change point detection for multiple change types	Apr 15, 2022	BenchmarkingChange Point Detection	CodeCode Available	1	5
Deep Learning-Based Synchronization for Uplink NB-IoT	May 22, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans	Jan 14, 2021	BenchmarkingMedical Diagnosis	CodeCode Available	1	5
Benchmarking Meaning Representations in Neural Semantic Parsing	Nov 1, 2020	BenchmarkingSemantic Parsing	CodeCode Available	1	5
DocuMint: Docstring Generation for Python using Small Language Models	May 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs	Feb 13, 2025	BenchmarkingRetrieval	CodeCode Available	1	5
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking	Oct 14, 2022	BenchmarkingGPU	CodeCode Available	1	5

Show:10 25 50

← PrevPage 14 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified