SOTAVerified

Benchmarking

Papers

Showing 20512075 of 5548 papers

TitleStatusHype
Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery0
Benchmarking Deep Learning-Based Methods for Irradiance Nowcasting with Sky Images0
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition0
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?Code0
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance0
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics0
CSPO: Cross-Market Synergistic Stock Price Movement Forecasting with Pseudo-volatility Optimization0
Benchmarking and optimizing organism wide single-cell RNA alignment methodsCode0
Can geometric combinatorics improve RNA branching predictions?Code0
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy0
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing0
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy0
Reservoir Computing with a Single Oscillating Gas Bubble: Emphasizing the Chaotic Regime0
Writing as a testbed for open ended agents0
Benchmarking Burst Super-Resolution for Polarization Images: Noise Dataset and Analysis0
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation0
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition0
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch SchedulingCode0
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages0
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming LanguagesCode0
Regularization of ML models for Earth systems by using longer model timesteps0
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering0
A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives0
Accurate Peak Detection in Multimodal Optimization via Approximated Landscape LearningCode0
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data0
Show:102550
← PrevPage 83 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified