Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1151–1200 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift	Dec 15, 2022	BenchmarkingImage Captioning	CodeCode Available	1	5
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Feb 28, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images	Oct 25, 2022	BenchmarkingFew-Shot Object Detection	CodeCode Available	1	5
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions	Jan 1, 2024	BenchmarkingInstruction Following	CodeCode Available	1	5
Benchmarking structure-based three-dimensional molecular generative models using GenBench3D: ligand conformation quality matters	Jul 5, 2024	Benchmarkingvalid	CodeCode Available	1	5
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1	5
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning	Jul 22, 2024	BenchmarkingHallucination	CodeCode Available	1	5
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1	5
HINT3: Raising the bar for Intent Detection in the Wild	Sep 29, 2020	BenchmarkingIntent Detection	CodeCode Available	1	5
A global analysis of metrics used for measuring performance in natural language processing	Apr 25, 2022	BenchmarkingMachine Translation	CodeCode Available	1	5
A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models	Mar 31, 2023	BenchmarkingCausal Discovery	CodeCode Available	1	5
BiBench: Benchmarking and Analyzing Network Binarization	Jan 26, 2023	BenchmarkingBinarization	CodeCode Available	1	5
A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced Cardiac Magnetic Resonance Imaging	Apr 26, 2020	BenchmarkingLeft Atrium Segmentation	CodeCode Available	1	5
Benchmarking Multidomain English-Indonesian Machine Translation	May 1, 2020	BenchmarkingMachine Translation	CodeCode Available	1	5
Automatic Detection of Generated Text is Easiest when Humans are Fooled	Nov 2, 2019	BenchmarkingLanguage Modelling	CodeCode Available	1	5
RGB-D Indiscernible Object Counting in Underwater Scenes	Apr 23, 2023	BenchmarkingDepth Estimation	CodeCode Available	1	5
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Feb 27, 2024	Benchmarking	CodeCode Available	1	5
GRecX: An Efficient and Unified Benchmark for GNN-based Recommendation	Nov 19, 2021	BenchmarkingManagement	CodeCode Available	1	5
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1	5
Graphs, Constraints, and Search for the Abstraction and Reasoning Corpus	Oct 18, 2022	ARCBenchmarking	CodeCode Available	1	5
GraphWorld: Fake Graphs Bring Real Insights for GNNs	Feb 28, 2022	Benchmarking	CodeCode Available	1	5
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models	Dec 15, 2023	BenchmarkingCode Summarization	CodeCode Available	1	5
GraphGallery: A Platform for Fast Benchmarking and Easy Development of Graph Neural Networks Based Intelligent Software	Feb 16, 2021	Benchmarking	CodeCode Available	1	5
Biomedical Data-to-Text Generation via Fine-Tuning Transformers	Sep 3, 2021	BenchmarkingData-to-Text Generation	CodeCode Available	1	5
A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking	Jun 15, 2024	BenchmarkingGPU	CodeCode Available	1	5
Graph Neural Network-Based Anomaly Detection for River Network Systems	Apr 19, 2023	Anomaly DetectionBenchmarking	CodeCode Available	1	5
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1	5
BLADE: Benchmarking Language Model Agents for Data-Driven Science	Aug 19, 2024	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Simulation-Based Inference	Jan 12, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Visual Localization for Autonomous Navigation	Mar 24, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1	5
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
Benchmarking Multi-Scene Fire and Smoke Detection	Oct 22, 2024	Benchmarking	CodeCode Available	1	5
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses	Mar 3, 2025	Benchmarking	CodeCode Available	1	5
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions	May 27, 2022	BenchmarkingFew-Shot Image Classification	CodeCode Available	1	5
Boosting Healthcare LLMs Through Retrieved Context	Sep 23, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
Boosting Neural Image Compression for Machines Using Latent Space Masking	Dec 15, 2021	BenchmarkingImage Compression	CodeCode Available	1	5
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1	5
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning	Nov 8, 2021	Adversarial RobustnessBenchmarking	CodeCode Available	1	5
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text	Apr 28, 2025	Benchmarking	CodeCode Available	1	5
Grounding Descriptions in Images informs Zero-Shot Visual Recognition	Dec 5, 2024	AttributeBenchmarking	CodeCode Available	1	5
AI Accelerator Survey and Trends	Sep 18, 2021	BenchmarkingComputational Efficiency	CodeCode Available	1	5
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset	Jun 14, 2022	BenchmarkingIschemic Stroke Lesion Segmentation	CodeCode Available	1	5
Benchmarking Neural Network Generalization for Grammar Induction	Aug 16, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1	5
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing	Mar 2, 2024	AttributeBenchmarking	CodeCode Available	1	5
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond	Jun 16, 2023	BenchmarkingEvidence Selection	CodeCode Available	1	5
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking	Oct 3, 2023	Benchmarkingcounterfactual	CodeCode Available	1	5
GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking	May 28, 2025	BenchmarkingText Spotting	CodeCode Available	1	5
GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models	Jul 3, 2024	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 24 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified