Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4901–4950 of 5548 papers

Title	Date	Tasks	Status
Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks	Mar 4, 2024	Benchmarking	CodeCode Available
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection	May 1, 2022	BenchmarkingHate Speech Detection	CodeCode Available
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates	Jun 2, 2022	Benchmarking	CodeCode Available
Benchmarking Positional Encodings for GNNs and Graph Transformers	Nov 19, 2024	Benchmarking	CodeCode Available
Fast and accurate alignment of long bisulfite-seq reads	Jan 6, 2014	Benchmarking	CodeCode Available
Benchmarking Popular Classification Models' Robustness to Random and Targeted Corruptions	Jan 31, 2020	BenchmarkingClassification	CodeCode Available
False Promises in Medical Imaging AI? Assessing Validity of Outperformance Claims	May 7, 2025	Benchmarking	CodeCode Available
Benchmarking Perturbation-based Saliency Maps for Explaining Atari Agents	Jan 18, 2021	Atari GamesBenchmarking	CodeCode Available
Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains	Mar 29, 2025	Anomaly DetectionBenchmarking	CodeCode Available
Benchmarking person re-identification datasets and approaches for practical real-world implementations	Dec 20, 2022	BenchmarkingPedestrian Detection	CodeCode Available
FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNs	Dec 27, 2023	BenchmarkingGPU	CodeCode Available
FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainability	Jun 20, 2024	BenchmarkingFairness	CodeCode Available
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real News	Apr 21, 2024	BenchmarkingEmotion Recognition	CodeCode Available
Benchmarking performance of object detection under image distortions in an uncontrolled environment	Oct 28, 2022	BenchmarkingObject	CodeCode Available
GUNNEL: Guided Mixup Augmentation and Multi-View Fusion for Aquatic Animal Segmentation	Dec 12, 2021	BenchmarkingInstance Segmentation	CodeCode Available
Multimodal Benchmarking and Recommendation of Text-to-Image Generation Models	May 6, 2025	BenchmarkingImage Generation	CodeCode Available
Segmenting France Across Four Centuries	May 30, 2025	BenchmarkingImage-to-Image Translation	CodeCode Available
Audio Explanation Synthesis with Generative Foundation Models	Oct 10, 2024	BenchmarkingDecision Making	CodeCode Available
Benchmarking Tropical Cyclone Rapid Intensification with Satellite Images and Attention-based Deep Models	Sep 25, 2019	BenchmarkingDeep Learning	CodeCode Available
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes	Jun 3, 2025	BenchmarkingFeature Engineering	CodeCode Available
Can LLMs perform structured graph reasoning?	Feb 2, 2024	BenchmarkingNavigate	CodeCode Available
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object Detectors	Mar 14, 2024	BenchmarkingDomain Adaptation	CodeCode Available
Exploring Model-based Planning with Policy Networks	Jun 20, 2019	Benchmarkingmodel	CodeCode Available
Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and Benchmark	Jun 30, 2021	BenchmarkingPrediction	CodeCode Available
Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test	Mar 8, 2023	BenchmarkingTime Series	CodeCode Available
Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation	Jul 6, 2019	BenchmarkingDomain Adaptation	CodeCode Available
Zero-shot generation of synthetic neurosurgical data with large language models	Feb 13, 2025	BenchmarkingDe-identification	CodeCode Available
Benchmarking Pathology Foundation Models: Adaptation Strategies and Scenarios	Oct 21, 2024	BenchmarkingFew-Shot Learning	CodeCode Available
Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks	Mar 6, 2024	Anomaly DetectionBenchmarking	CodeCode Available
Multiple Instance Learning: A Survey of Problem Characteristics and Applications	Dec 11, 2016	BenchmarkingDocument Classification	CodeCode Available
Self-Adjusting Weighted Expected Improvement for Bayesian Optimization	Jun 7, 2023	Bayesian OptimizationBenchmarking	CodeCode Available
Multiple Light Source Dataset for Colour Research	Aug 16, 2019	BenchmarkingImage Segmentation	CodeCode Available
Experimental Analysis of Large-scale Learnable Vector Storage Compression	Nov 27, 2023	Benchmarking	CodeCode Available
Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box Optimization	Apr 4, 2024	Benchmarking	CodeCode Available
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process	Aug 22, 2023	BenchmarkingDomain Adaptation	CodeCode Available
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks	Mar 5, 2025	Benchmarkinggraph construction	CodeCode Available
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection	Aug 22, 2023	BenchmarkingOut-of-Distribution Detection	CodeCode Available
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem	Feb 11, 2025	BenchmarkingDiversity	CodeCode Available
Benchmarking optimality of time series classification methods in distinguishing diffusions	Jan 30, 2023	BenchmarkingGaussian Processes	CodeCode Available
ExEBench: Benchmarking Foundation Models on Extreme Earth Events	May 13, 2025	BenchmarkingManagement	CodeCode Available
MULTITAT: Benchmarking Multilingual Table-and-Text Question Answering	Feb 24, 2025	BenchmarkingQuestion Answering	CodeCode Available
Evolving Evolutionary Algorithms with Patterns	Oct 10, 2021	BenchmarkingEvolutionary Algorithms	CodeCode Available
Semantic Hilbert Space for Text Representation Learning	Feb 26, 2019	BenchmarkingGeneral Classification	CodeCode Available
A Continuous Information Gain Measure to Find the Most Discriminatory Problems for AI Benchmarking	Sep 9, 2018	BenchmarkingGame Design	CodeCode Available
Timage -- A Robust Time Series Classification Pipeline	Sep 19, 2019	BenchmarkingClassification	CodeCode Available
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection	Feb 6, 2024	Benchmarking	CodeCode Available
EvoLearner: Learning Description Logics with Evolutionary Algorithms	Nov 8, 2021	BenchmarkingEvolutionary Algorithms	CodeCode Available
Evidential Deep Learning for Uncertainty Quantification and Out-of-Distribution Detection in Jet Identification using Deep Neural Networks	Jan 10, 2025	Anomaly DetectionBenchmarking	CodeCode Available
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data	Aug 3, 2024	BenchmarkingKnowledge Graphs	CodeCode Available

Show:10 25 50

← PrevPage 99 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified