SOTAVerified

Benchmarking

Papers

Showing 29212930 of 5548 papers

TitleStatusHype
Adaptive Visual Scene Understanding: Incremental Scene Graph GenerationCode0
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBenchCode1
A New Real-World Video Dataset for the Comparison of Defogging Algorithms0
NewsRecLib: A PyTorch-Lightning Library for Neural News RecommendationCode1
TRAM: Benchmarking Temporal Reasoning for Large Language Models0
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems0
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language ModelsCode2
Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method0
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks0
Show:102550
← PrevPage 293 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified