SOTAVerified

Benchmarking

Papers

Showing 13011310 of 5548 papers

TitleStatusHype
PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time SeriesCode0
Multi-Agent Environments for Vehicle Routing ProblemsCode1
Forecasting Future International Events: A Reliable Dataset for Text-Based Event ModelingCode0
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking0
Delta-Influence: Unlearning Poisons via Influence FunctionsCode0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver0
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative ModelsCode5
BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
The Moral Mind(s) of Large Language Models0
Show:102550
← PrevPage 131 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified