Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 5548 papers

Title	Date	Tasks	Status	Hype
ClearPose: Large-scale Transparent Object Dataset and Benchmark	Mar 8, 2022	BenchmarkingDepth Completion	CodeCode Available	1
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials	Nov 6, 2021	BenchmarkingNeural Network simulation	CodeCode Available	1
Large Scale MRI Collection and Segmentation of Cirrhotic Liver	Oct 6, 2024	BenchmarkingDiagnostic	CodeCode Available	1
BeHonest: Benchmarking Honesty in Large Language Models	Jun 19, 2024	BenchmarkingMisinformation	CodeCode Available	1
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain	Jun 1, 2022	BenchmarkingEmotion Recognition	CodeCode Available	1
AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling	Nov 1, 2021	Benchmarkingobject-detection	CodeCode Available	1
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1
Geometric Deep Learning for Structure-Based Drug Design: A Survey	Jun 20, 2023	BenchmarkingDeep Learning	CodeCode Available	1
A Comprehensive Study of the Robustness for LiDAR-based 3D Object Detectors against Adversarial Attacks	Dec 20, 2022	3D Object DetectionBenchmarking	CodeCode Available	1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1
A multi-schematic classifier-independent oversampling approach for imbalanced datasets	Jul 15, 2021	Benchmarking	CodeCode Available	1
End-to-end Knowledge Retrieval with Multi-modal Queries	Jun 1, 2023	BenchmarkingCross-Modal Retrieval	CodeCode Available	1
Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers	Jul 9, 2020	Benchmarking	CodeCode Available	1
Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization	May 27, 2025	Benchmarking	CodeCode Available	1
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective	Oct 8, 2024	AttributeBenchmarking	CodeCode Available	1
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models	Dec 5, 2023	BenchmarkingVisual Question Answering	CodeCode Available	1
Coarse-to-Fine Q-attention with Learned Path Ranking	Apr 4, 2022	Benchmarking	CodeCode Available	1
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis	Aug 12, 2021	BenchmarkingMedical Image Analysis	CodeCode Available	1
Anabranch Network for Camouflaged Object Segmentation	May 20, 2021	BenchmarkingCamouflaged Object Segmentation	CodeCode Available	1
Evaluating Attribution for Graph Neural Networks	Dec 1, 2020	Benchmarking	CodeCode Available	1
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1
Evaluating Multimodal Representations on Visual Semantic Textual Similarity	Apr 4, 2020	BenchmarkingImage Captioning	CodeCode Available	1
Evaluation of large language models for discovery of gene set function	Sep 7, 2023	BenchmarkingLanguage Modelling	CodeCode Available	1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1
Benchmarking deep inverse models over time, and the neural-adjoint method	Sep 27, 2020	Benchmarking	CodeCode Available	1
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1
Examining the Effects of Degree Distribution and Homophily in Graph Learning Models	Jul 17, 2023	BenchmarkingGraph Clustering	CodeCode Available	1
Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization	Dec 27, 2021	Bayesian OptimizationBenchmarking	CodeCode Available	1
Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling	May 23, 2024	Benchmarking	CodeCode Available	1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19	Feb 9, 2021	BenchmarkingQ-Learning	CodeCode Available	1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1
Exploring Large Language Models for Classical Philology	May 23, 2023	BenchmarkingDecoder	CodeCode Available	1
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Mar 27, 2025	AttributeBenchmarking	CodeCode Available	1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints	Apr 18, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension	Mar 26, 2022	BenchmarkingQuestion Answering	CodeCode Available	1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms	Aug 25, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1
featsel: A framework for benchmarking of feature selection algorithms and cost functions	Jul 19, 2017	BenchmarkingComputational Efficiency	CodeCode Available	1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1
Benchmarking Adversarial Patch Against Aerial Detection	Oct 30, 2022	Benchmarking	CodeCode Available	1
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1
Benchmarking Adversarial Robustness on Image Classification	Jun 1, 2020	Adversarial AttackAdversarial Robustness	CodeCode Available	1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods	Aug 2, 2022	BenchmarkingCausal Discovery	CodeCode Available	1
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1

Show:10 25 50

← PrevPage 16 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified