Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2076–2100 of 5548 papers

Title	Date	Tasks	Status
Event Camera Simulator Design for Modeling Attention-based Inference Architectures	May 3, 2021	Benchmarking	—Unverified
Can time series forecasting be automated? A benchmark and analysis	Jul 23, 2024	BenchmarkingDecision Making	—Unverified
Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features?	May 26, 2020	Benchmarking	—Unverified
An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space	Sep 27, 2020	Benchmarking	—Unverified
An Analysis of Model Robustness across Concurrent Distribution Shifts	Jan 8, 2025	Benchmarking	—Unverified
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates	May 28, 2025	BenchmarkingDiversity	—Unverified
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets	Apr 28, 2025	ArticlesBenchmarking	—Unverified
Benchmarking a (μ+λ) Genetic Algorithm with Configurable Crossover Probability	Jun 10, 2020	Benchmarking	—Unverified
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind	May 18, 2025	BenchmarkingScene Understanding	—Unverified
Can Language Models Serve as Text-Based World Simulators?	Jun 10, 2024	BenchmarkingDecision Making	—Unverified
Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation	Jun 6, 2024	BenchmarkingDrug Discovery	—Unverified
Evaluation Methods and Measures for Causal Learning Algorithms	Feb 7, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization	Sep 29, 2021	Bayesian OptimizationBenchmarking	—Unverified
Can humans help BERT gain "confidence"?	Aug 31, 2023	BenchmarkingEEG	—Unverified
An Analysis of Control Parameters of MOEA/D Under Two Different Optimization Scenarios	Oct 2, 2020	BenchmarkingEvolutionary Algorithms	—Unverified
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging	May 2, 2025	BenchmarkingComputational Efficiency	—Unverified
Benchmarking Algorithms for Automatic License Plate Recognition	Mar 27, 2022	BenchmarkingLicense Plate Recognition	—Unverified
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate	May 22, 2023	BenchmarkingMath	—Unverified
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging	Jan 15, 2025	BenchmarkingComputational Efficiency	—Unverified
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time	Dec 1, 2023	ArticlesBenchmarking	—Unverified
A Dataset for Benchmarking Image-Based Localization	Jul 1, 2017	BenchmarkingImage-Based Localization	—Unverified
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge	Feb 21, 2019	AnatomyBenchmarking	—Unverified
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis	Apr 9, 2025	Benchmarking	—Unverified
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation	Aug 10, 2023	AttributeBenchmarking	—Unverified
Evaluating the Performance of Large Language Models via Debates	Jun 16, 2024	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 84 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified