SOTAVerified

Benchmarking

Papers

Showing 281290 of 5548 papers

TitleStatusHype
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)Code2
Deep Visual Geo-localization BenchmarkCode2
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion TransferCode2
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval ModelsCode2
DaisyRec 2.0: Benchmarking Recommendation for Rigorous EvaluationCode2
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and InteractionsCode2
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil EngineeringCode2
Show:102550
← PrevPage 29 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified