Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1151–1200 of 5548 papers

Title	Date	Tasks	Status	Hype
CovDocker: Benchmarking Covalent Drug Design with Tasks, Datasets, and Solutions	Jun 26, 2025	BenchmarkingDrug Design	CodeCode Available	1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Jan 2, 2025	BenchmarkingComputer Security	CodeCode Available	1
DependEval: Benchmarking LLMs for Repository Dependency Understanding	Mar 9, 2025	BenchmarkingCode Generation	CodeCode Available	1
Benchmarking Quantized Neural Networks on FPGAs with FINN	Feb 2, 2021	BenchmarkingQuantization	CodeCode Available	1
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime	May 28, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	1
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Feb 27, 2024	Benchmarking	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1
A global analysis of metrics used for measuring performance in natural language processing	Apr 25, 2022	BenchmarkingMachine Translation	CodeCode Available	1
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models	Mar 31, 2023	BenchmarkingCausal Discovery	CodeCode Available	1
EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset	Oct 11, 2021	BenchmarkingFace Hallucination	CodeCode Available	1
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging	Apr 26, 2020	BenchmarkingLeft Atrium Segmentation	CodeCode Available	1
Benchmarking Multidomain English-Indonesian Machine Translation	May 1, 2020	BenchmarkingMachine Translation	CodeCode Available	1
AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction	Jul 25, 2024	BenchmarkingDeep Learning	CodeCode Available	1
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding	Jul 17, 2023	BenchmarkingDeep Learning	CodeCode Available	1
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Jun 15, 2025	Benchmarking	CodeCode Available	1
Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Apr 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search	Nov 24, 2021	BenchmarkingNeural Architecture Search	CodeCode Available	1
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models	May 26, 2025	BenchmarkingRAG	CodeCode Available	1
A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization	Aug 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation	Oct 10, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1
Recent Advances on Neural Network Pruning at Initialization	Mar 11, 2021	BenchmarkingNetwork Pruning	CodeCode Available	1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness	Mar 24, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	1
End-to-end Emotion-Cause Pair Extraction via Learning to Link	Feb 25, 2020	BenchmarkingEmotion Cause Extraction	CodeCode Available	1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit	Sep 7, 2022	Benchmarking	CodeCode Available	1
Benchmarking Visual Localization for Autonomous Navigation	Mar 24, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking Multi-Scene Fire and Smoke Detection	Oct 22, 2024	Benchmarking	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
Benchmarking Omni-Vision Representation through the Lens of Visual Realms	Jul 14, 2022	BenchmarkingContrastive Learning	CodeCode Available	1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	Jun 14, 2020	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices	Jun 21, 2025	BenchmarkingCPU	CodeCode Available	1
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes	Nov 22, 2021	Benchmarking	CodeCode Available	1
Evaluating Attribution for Graph Neural Networks	Dec 1, 2020	Benchmarking	CodeCode Available	1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond	Jun 16, 2023	BenchmarkingEvidence Selection	CodeCode Available	1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs	Nov 2, 2020	Benchmarking	CodeCode Available	1
Benchmarking Neural Network Generalization for Grammar Induction	Aug 16, 2023	Benchmarking	CodeCode Available	1
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs	Nov 5, 2022	AttributeBenchmarking	CodeCode Available	1
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1

Show:10 25 50

← PrevPage 24 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified