Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2451–2500 of 5548 papers

Title	Date	Tasks	Status	Score
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Jan 22, 2025	Benchmarking	CodeCode Available	5
EvoLearner: Learning Description Logics with Evolutionary Algorithms	Nov 8, 2021	BenchmarkingEvolutionary Algorithms	CodeCode Available	5
Graph Convolutional Networks Meet with High Dimensionality Reduction	Nov 7, 2019	BenchmarkingDimensionality Reduction	CodeCode Available	5
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset	Feb 8, 2024	Benchmarking	CodeCode Available	5
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo	Mar 14, 2019	BenchmarkingOpenAI Gym	CodeCode Available	5
Strong and Simple Baselines for Multimodal Utterance Embeddings	May 14, 2019	Benchmarking	CodeCode Available	5
GOAL: Towards Benchmarking Few-Shot Sports Game Summarization	Jul 18, 2022	Benchmarking	CodeCode Available	5
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams	Jun 17, 2024	AllBenchmarking	CodeCode Available	5
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available	5
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available	5
LoopDB: A Loop Closure Dataset for Large Scale Simultaneous Localization and Mapping	Jun 7, 2025	BenchmarkingSimultaneous Localization and Mapping	CodeCode Available	5
Benchmarking Large Language Models for Math Reasoning Tasks	Aug 20, 2024	BenchmarkingIn-Context Learning	CodeCode Available	5
Benchmarking Large Language Models for Image Classification of Marine Mammals	Oct 22, 2024	Benchmarkingimage-classification	CodeCode Available	5
Divergent Creativity in Humans and Large Language Models	May 13, 2024	Benchmarking	CodeCode Available	5
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks	Jan 7, 2024	BenchmarkingGraph Neural Network	CodeCode Available	5
Distributional Depth-Based Estimation of Object Articulation Models	Aug 12, 2021	BenchmarkingObject	CodeCode Available	5
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation	Oct 29, 2021	BenchmarkingBrain Tumor Segmentation	CodeCode Available	5
A Framework for Generating Informative Benchmark Instances	May 29, 2022	Benchmarking	CodeCode Available	5
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection	Aug 22, 2023	BenchmarkingOut-of-Distribution Detection	CodeCode Available	5
Experimental Analysis of Large-scale Learnable Vector Storage Compression	Nov 27, 2023	Benchmarking	CodeCode Available	5
Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box Optimization	Apr 4, 2024	Benchmarking	CodeCode Available	5
AI-generated Image Quality Assessment in Visual Communication	Dec 20, 2024	BenchmarkingImage Quality Assessment	CodeCode Available	5
Geological Inference from Textual Data using Word Embeddings	Apr 10, 2025	BenchmarkingWord Embeddings	CodeCode Available	5
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available	5
AstroVision: Towards Autonomous Feature Detection and Description for Missions to Small Bodies Using Deep Learning	Aug 3, 2022	Benchmarking	CodeCode Available	5
Machine learning classification of non-Markovian noise disturbing quantum dynamics	Jan 8, 2021	BenchmarkingBIG-bench Machine Learning	CodeCode Available	5
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	5
A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Voice	Dec 20, 2024	BenchmarkingDiagnostic	CodeCode Available	5
Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability	Feb 18, 2020	BenchmarkingFederated Learning	CodeCode Available	5
Flexible Generation of Preference Data for Recommendation Analysis	Jul 23, 2024	BenchmarkingRecommendation Systems	CodeCode Available	5
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI	Mar 7, 2024	Benchmarking	CodeCode Available	5
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions	Aug 2, 2024	Benchmarkingmultimodal interaction	CodeCode Available	5
Benchmarking Large Language Models for Molecule Prediction Tasks	Mar 8, 2024	BenchmarkingPrediction	CodeCode Available	5
DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions	May 8, 2025	Autonomous NavigationBenchmarking	CodeCode Available	5
Are Large Language Models Good at Utility Judgments?	Mar 28, 2024	Answer GenerationBenchmarking	CodeCode Available	5
Benchmarking performance of object detection under image distortions in an uncontrolled environment	Oct 28, 2022	BenchmarkingObject	CodeCode Available	5
DispaRisk: Auditing Fairness Through Usable Information	May 20, 2024	BenchmarkingBias Detection	CodeCode Available	5
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making	Sep 9, 2024	BenchmarkingDecision Making	CodeCode Available	5
Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and Benchmark	Jun 30, 2021	BenchmarkingPrediction	CodeCode Available	5
Benchmarking Perturbation-based Saliency Maps for Explaining Atari Agents	Jan 18, 2021	Atari GamesBenchmarking	CodeCode Available	5
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	5
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available	5
Exploring Model-based Planning with Policy Networks	Jun 20, 2019	Benchmarkingmodel	CodeCode Available	5
GenderBench: Evaluation Suite for Gender Biases in LLMs	May 17, 2025	Benchmarking	CodeCode Available	5
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available	5
Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms	Jul 1, 2022	BenchmarkingClassification	CodeCode Available	5
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations	Jun 17, 2024	BenchmarkingDataset Generation	CodeCode Available	5
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting	Apr 15, 2024	Benchmarking	CodeCode Available	5
Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters	Jun 16, 2024	BenchmarkingInstance Segmentation	CodeCode Available	5
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available	5

Show:10 25 50

← PrevPage 50 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified