SOTAVerified

Benchmarking

Papers

Showing 671680 of 5548 papers

TitleStatusHype
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models0
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation0
Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous DomainsCode0
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis0
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations0
SimBank: from Simulation to Solution in Prescriptive Process Monitoring0
Generalization Bias in Large Language Model Summarization of Scientific Research0
LIM: Large Interpolator Model for Dynamic Reconstruction0
An Advanced Ensemble Deep Learning Framework for Stock Price Prediction Using VAE, Transformer, and LSTM Model0
Benchmarking Ultra-Low-Power μNPUs0
Show:102550
← PrevPage 68 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified