SOTAVerified

Benchmarking

Papers

Showing 20412050 of 5548 papers

TitleStatusHype
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations0
Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous DomainsCode0
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis0
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation0
Generalization Bias in Large Language Model Summarization of Scientific Research0
An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model0
LIM: Large Interpolator Model for Dynamic Reconstruction0
Benchmarking Ultra-Low-Power μNPUs0
Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery0
Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug ErrorsCode0
Show:102550
← PrevPage 205 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified