SOTAVerified

Benchmarking

Papers

Showing 571580 of 5548 papers

TitleStatusHype
Multi-Agent Environments for Vehicle Routing ProblemsCode1
StackEval: Benchmarking LLMs in Coding AssistanceCode1
DLBacktrace: A Model Agnostic Explainability for any Deep Learning ModelsCode1
Introducing Milabench: Benchmarking Accelerators for AICode1
FM-TS: Flow Matching for Time Series GenerationCode1
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantificationCode1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity DatasetCode1
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph GenerationCode1
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
ROAD-Waymo: Action Awareness at Scale for Autonomous DrivingCode1
Show:102550
← PrevPage 58 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified