SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1491–1500 of 5548 papers

Title	Date	Tasks	Status	Hype
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	Oct 14, 2024	2kBenchmarking	CodeCode Available	1
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning	Oct 14, 2024	Atari GamesBenchmarking	—Unverified	0
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment	Oct 13, 2024	Benchmarking	CodeCode Available	1
LLM-Based Multi-Agent Systems are Scalable Graph Generative Models	Oct 13, 2024	BenchmarkingGraph Generation	CodeCode Available	2
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Oct 13, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection Models	Oct 12, 2024	BenchmarkingMisinformation	CodeCode Available	0
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English	Oct 12, 2024	Benchmarking	CodeCode Available	0
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback	Oct 12, 2024	Benchmarking	CodeCode Available	0
A Comparative Analysis on Ethical Benchmarking in Large Language Models	Oct 11, 2024	BenchmarkingDecision Making	—Unverified	0
Enterprise Benchmarks for Large Language Model Evaluation	Oct 11, 2024	BenchmarkingLanguage Model Evaluation	CodeCode Available	0

Show:10 25 50

← PrevPage 150 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified