SOTAVerified

Benchmarking

Papers

Showing 20512075 of 5548 papers

TitleStatusHype
Causal Analysis of ASR Errors for Children: Quantifying the Impact of Physiological, Cognitive, and Extrinsic Factors0
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data0
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking0
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study0
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
Benchmarking and Comparing Multi-exposure Image Fusion Algorithms0
Cash versus Kind: Benchmarking a Child Nutrition Program against Unconditional Cash Transfers in Rwanda0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT0
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models0
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems0
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images0
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data0
A Dataset for Developing and Benchmarking Active Vision0
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy0
Capsule Neural Networks for Graph Classification using Explicit Tensorial Graph Representations0
An approach for benchmarking the numerical solutions of stochastic compartmental models0
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks0
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era0
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest0
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation0
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment0
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs0
Benchmarking and Analyzing Generative Data for Visual Recognition0
Show:102550
← PrevPage 83 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified