SOTAVerified

Benchmarking

Papers

Showing 231240 of 5548 papers

TitleStatusHype
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMsCode2
EasyTPP: Towards Open Benchmarking Temporal Point ProcessesCode2
Fast Vision Transformers with HiLo AttentionCode2
Benchmarking Deep Reinforcement Learning for Continuous ControlCode2
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
Desbordante: from benchmarking suite to high-performance science-intensive data profiler (preprint)Code2
Deep Visual Geo-localization BenchmarkCode2
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentationsCode2
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement LearningCode2
A Content-Driven Micro-Video Recommendation Dataset at ScaleCode2
Show:102550
← PrevPage 24 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified