SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Datasets and Benchmarks for Offline Safe Reinforcement Learning	Jun 15, 2023	Autonomous DrivingBenchmarking	CodeCode Available	2	5
CoqPilot, a plugin for LLM-based generation of proofs	Oct 25, 2024	Benchmarking	CodeCode Available	2	5
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models	Jun 18, 2024	BenchmarkingDepth Estimation	CodeCode Available	2	5
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks	Nov 28, 2024	BenchmarkingObject Counting	CodeCode Available	2	5
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark	Mar 10, 2025	Autonomous DrivingBenchmarking	CodeCode Available	2	5
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act	Oct 10, 2024	BenchmarkingFairness	CodeCode Available	2	5
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension	Feb 12, 2024	2kAutomatic Speech Recognition	CodeCode Available	2	5
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization	Sep 24, 2024	3D geometry3DGS	CodeCode Available	2	5
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly	May 15, 2025	8kBenchmarking	CodeCode Available	2	5
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation	Oct 30, 2024	BenchmarkingPassage Retrieval	CodeCode Available	2	5

Show:10 25 50

← PrevPage 32 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified