Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1176–1200 of 5548 papers

Title	Date	Tasks	Status	Hype
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness	Mar 24, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	1
End-to-end Emotion-Cause Pair Extraction via Learning to Link	Feb 25, 2020	BenchmarkingEmotion Cause Extraction	CodeCode Available	1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit	Sep 7, 2022	Benchmarking	CodeCode Available	1
Benchmarking Visual Localization for Autonomous Navigation	Mar 24, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking Multi-Scene Fire and Smoke Detection	Oct 22, 2024	Benchmarking	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
Benchmarking Omni-Vision Representation through the Lens of Visual Realms	Jul 14, 2022	BenchmarkingContrastive Learning	CodeCode Available	1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	Jun 14, 2020	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices	Jun 21, 2025	BenchmarkingCPU	CodeCode Available	1
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models	Oct 17, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes	Nov 22, 2021	Benchmarking	CodeCode Available	1
Evaluating Attribution for Graph Neural Networks	Dec 1, 2020	Benchmarking	CodeCode Available	1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond	Jun 16, 2023	BenchmarkingEvidence Selection	CodeCode Available	1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs	Nov 2, 2020	Benchmarking	CodeCode Available	1
Benchmarking Neural Network Generalization for Grammar Induction	Aug 16, 2023	Benchmarking	CodeCode Available	1
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs	Nov 5, 2022	AttributeBenchmarking	CodeCode Available	1
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1

Show:10 25 50

← PrevPage 48 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified