SOTAVerified

Benchmarking

Papers

Showing 20512100 of 5548 papers

TitleStatusHype
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation0
Event-based Feature Extraction Using Adaptive Selection Thresholds0
CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization0
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection0
Benchmarking and Comparing Multi-exposure Image Fusion Algorithms0
Cash versus Kind: Benchmarking a Child Nutrition Program against Unconditional Cash Transfers in Rwanda0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT0
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?0
Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems0
Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images0
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data0
A Dataset for Developing and Benchmarking Active Vision0
Evaluation of simulation methods for tumor subclonal reconstruction0
Capsule Neural Networks for Graph Classification using Explicit Tensorial Graph Representations0
An approach for benchmarking the numerical solutions of stochastic compartmental models0
Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks0
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era0
Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest0
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi0
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation0
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment0
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs0
Benchmarking and Analyzing Generative Data for Visual Recognition0
A dataset for benchmarking vision-based localization at intersections0
Evaluation of Three Welsh Language POS Taggers0
Event Camera Simulator Design for Modeling Attention-based Inference Architectures0
Can time series forecasting be automated? A benchmark and analysis0
Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features?0
An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space0
An Analysis of Model Robustness across Concurrent Distribution Shifts0
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates0
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets0
Benchmarking a (μ+λ) Genetic Algorithm with Configurable Crossover Probability0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
Can Language Models Serve as Text-Based World Simulators?0
Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation0
Evaluation Methods and Measures for Causal Learning Algorithms0
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization0
Can humans help BERT gain "confidence"?0
An Analysis of Control Parameters of MOEA/D Under Two Different Optimization Scenarios0
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging0
Benchmarking Algorithms for Automatic License Plate Recognition0
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate0
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time0
A Dataset for Benchmarking Image-Based Localization0
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge0
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis0
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation0
Evaluating the Performance of Large Language Models via Debates0
Show:102550
← PrevPage 42 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified