SOTAVerified

Benchmarking

Papers

Showing 20762100 of 5548 papers

TitleStatusHype
Event Camera Simulator Design for Modeling Attention-based Inference Architectures0
Can time series forecasting be automated? A benchmark and analysis0
Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features?0
An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space0
An Analysis of Model Robustness across Concurrent Distribution Shifts0
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates0
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets0
Benchmarking a (μ+λ) Genetic Algorithm with Configurable Crossover Probability0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
Can Language Models Serve as Text-Based World Simulators?0
Benchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy changes upon mutation0
Evaluation Methods and Measures for Causal Learning Algorithms0
Benchmarking Algorithms from Machine Learning for Low-Budget Black-Box Optimization0
Can humans help BERT gain "confidence"?0
An Analysis of Control Parameters of MOEA/D Under Two Different Optimization Scenarios0
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging0
Benchmarking Algorithms for Automatic License Plate Recognition0
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate0
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time0
A Dataset for Benchmarking Image-Based Localization0
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge0
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis0
Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation0
Evaluating the Performance of Large Language Models via Debates0
Show:102550
← PrevPage 84 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified