SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2921–2930 of 5548 papers

Title	Date	Tasks	Status	Hype
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation	Oct 2, 2023	BenchmarkingContinual Learning	CodeCode Available	0
Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench	Oct 2, 2023	BenchmarkingSafety Alignment	CodeCode Available	1
A New Real-World Video Dataset for the Comparison of Defogging Algorithms	Oct 2, 2023	BenchmarkingDeblurring	—Unverified	0
NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation	Oct 2, 2023	BenchmarkingNews Recommendation	CodeCode Available	1
TRAM: Benchmarking Temporal Reasoning for Large Language Models	Oct 2, 2023	BenchmarkingFew-Shot Learning	—Unverified	0
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems	Oct 2, 2023	BenchmarkingComputational Efficiency	—Unverified	0
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models	Oct 1, 2023	Benchmarking	CodeCode Available	2
Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method	Sep 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified	0
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks	Sep 30, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 293 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified