SOTAVerified

Benchmarking

Papers

Showing 26512660 of 5548 papers

TitleStatusHype
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
Benchmarking projective simulation in navigation problems0
Foundations for learning from noisy quantum experiments0
Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate0
A Survey on LLM-based News Recommender Systems0
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects0
Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning0
Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization0
FRED: The Florence RGB-Event Drone Dataset0
Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms0
Show:102550
← PrevPage 266 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified