Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2051–2100 of 5548 papers

Title	Date	Tasks	Status
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation	Mar 24, 2025	BenchmarkingData Augmentation	—Unverified
Event-based Feature Extraction Using Adaptive Selection Thresholds	Jul 18, 2019	Benchmarking	—Unverified
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization	Jun 24, 2024	Bayesian OptimizationBenchmarking	—Unverified
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified
Benchmarking and Comparing Multi-exposure Image Fusion Algorithms	Jul 30, 2020	BenchmarkingMulti-Exposure Image Fusion	—Unverified
Cash versus Kind: Benchmarking a Child Nutrition Program against Unconditional Cash Transfers in Rwanda	Jun 1, 2021	BenchmarkingDiversity	—Unverified
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?	Jun 21, 2023	BenchmarkingExplainable artificial intelligence	—Unverified
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems	Jul 22, 2024	BenchmarkingClustering	—Unverified
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images	Jun 11, 2024	BenchmarkingGPU	—Unverified
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data	Mar 22, 2025	BenchmarkingDisease Prediction	—Unverified
A Dataset for Developing and Benchmarking Active Vision	Feb 27, 2017	BenchmarkingGeneral Classification	—Unverified
Evaluation of simulation methods for tumor subclonal reconstruction	Feb 14, 2024	Benchmarking	—Unverified
Capsule Neural Networks for Graph Classification using Explicit Tensorial Graph Representations	Feb 22, 2019	BenchmarkingClassification	—Unverified
An approach for benchmarking the numerical solutions of stochastic compartmental models	Nov 4, 2022	Benchmarking	—Unverified
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks	Aug 1, 2023	Benchmarking	—Unverified
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era	Mar 16, 2025	BenchmarkingImage Captioning	—Unverified
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest	Dec 20, 2023	BenchmarkingIn-Context Learning	—Unverified
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi	Jul 15, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation	Feb 10, 2025	Benchmarking	—Unverified
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment	Oct 11, 2024	BenchmarkingReinforcement Learning (RL)	—Unverified
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs	Feb 16, 2025	Benchmarking	—Unverified
Benchmarking and Analyzing Generative Data for Visual Recognition	Jul 25, 2023	BenchmarkingRetrieval	—Unverified
A dataset for benchmarking vision-based localization at intersections	Nov 4, 2018	Benchmarking	—Unverified
Evaluation of Three Welsh Language POS Taggers	Jun 1, 2022	BenchmarkingPOS	—Unverified
Event Camera Simulator Design for Modeling Attention-based Inference Architectures	May 3, 2021	Benchmarking	—Unverified
Can time series forecasting be automated? A benchmark and analysis	Jul 23, 2024	BenchmarkingDecision Making	—Unverified
Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features?	May 26, 2020	Benchmarking	—Unverified
An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space	Sep 27, 2020	Benchmarking	—Unverified
An Analysis of Model Robustness across Concurrent Distribution Shifts	Jan 8, 2025	Benchmarking	—Unverified
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates	May 28, 2025	BenchmarkingDiversity	—Unverified
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets	Apr 28, 2025	ArticlesBenchmarking	—Unverified
Benchmarking a (μ+λ) Genetic Algorithm with Configurable Crossover Probability	Jun 10, 2020	Benchmarking	—Unverified
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind	May 18, 2025	BenchmarkingScene Understanding	—Unverified
Can Language Models Serve as Text-Based World Simulators?	Jun 10, 2024	BenchmarkingDecision Making	—Unverified
Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation	Jun 6, 2024	BenchmarkingDrug Discovery	—Unverified
Evaluation Methods and Measures for Causal Learning Algorithms	Feb 7, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization	Sep 29, 2021	Bayesian OptimizationBenchmarking	—Unverified
Can humans help BERT gain "confidence"?	Aug 31, 2023	BenchmarkingEEG	—Unverified
An Analysis of Control Parameters of MOEA/D Under Two Different Optimization Scenarios	Oct 2, 2020	BenchmarkingEvolutionary Algorithms	—Unverified
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging	May 2, 2025	BenchmarkingComputational Efficiency	—Unverified
Benchmarking Algorithms for Automatic License Plate Recognition	Mar 27, 2022	BenchmarkingLicense Plate Recognition	—Unverified
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate	May 22, 2023	BenchmarkingMath	—Unverified
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging	Jan 15, 2025	BenchmarkingComputational Efficiency	—Unverified
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time	Dec 1, 2023	ArticlesBenchmarking	—Unverified
A Dataset for Benchmarking Image-Based Localization	Jul 1, 2017	BenchmarkingImage-Based Localization	—Unverified
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge	Feb 21, 2019	AnatomyBenchmarking	—Unverified
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis	Apr 9, 2025	Benchmarking	—Unverified
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation	Aug 10, 2023	AttributeBenchmarking	—Unverified
Evaluating the Performance of Large Language Models via Debates	Jun 16, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 42 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified