Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2051–2075 of 5548 papers

Title	Date	Tasks	Status
Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors	Feb 12, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data	May 29, 2024	BenchmarkingSpecificity	—Unverified
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans	Jun 22, 2024	BenchmarkingDecision Making	—Unverified
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking	Apr 29, 2025	BenchmarkingIntrusion Detection	—Unverified
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study	Aug 26, 2024	8kBenchmarking	—Unverified
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization	Jun 24, 2024	Bayesian OptimizationBenchmarking	—Unverified
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection	Dec 11, 2023	BenchmarkingDomain Adaptation	—Unverified
Benchmarking and Comparing Multi-exposure Image Fusion Algorithms	Jul 30, 2020	BenchmarkingMulti-Exposure Image Fusion	—Unverified
Cash versus Kind: Benchmarking a Child Nutrition Program against Unconditional Cash Transfers in Rwanda	Jun 1, 2021	BenchmarkingDiversity	—Unverified
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models	Apr 29, 2024	BenchmarkingClustering	—Unverified
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems	Jul 22, 2024	BenchmarkingClustering	—Unverified
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images	Jun 11, 2024	BenchmarkingGPU	—Unverified
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data	Mar 22, 2025	BenchmarkingDisease Prediction	—Unverified
A Dataset for Developing and Benchmarking Active Vision	Feb 27, 2017	BenchmarkingGeneral Classification	—Unverified
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy	May 9, 2025	BenchmarkingSentiment Analysis	—Unverified
Capsule Neural Networks for Graph Classification using Explicit Tensorial Graph Representations	Feb 22, 2019	BenchmarkingClassification	—Unverified
An approach for benchmarking the numerical solutions of stochastic compartmental models	Nov 4, 2022	Benchmarking	—Unverified
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks	Aug 1, 2023	Benchmarking	—Unverified
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era	Mar 16, 2025	BenchmarkingImage Captioning	—Unverified
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest	Dec 20, 2023	BenchmarkingIn-Context Learning	—Unverified
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation	Feb 10, 2025	Benchmarking	—Unverified
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment	Oct 11, 2024	BenchmarkingReinforcement Learning (RL)	—Unverified
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs	Feb 16, 2025	Benchmarking	—Unverified
Benchmarking and Analyzing Generative Data for Visual Recognition	Jul 25, 2023	BenchmarkingRetrieval	—Unverified

Show:10 25 50

← PrevPage 83 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified