SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1101–1110 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Dec 11, 2024	AttributeBenchmarking	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification	Jun 20, 2024	BenchmarkingClassification	CodeCode Available	1
CommonPower: A Framework for Safe Data-Driven Smart Grid Control	Jun 5, 2024	Benchmarkingenergy management	CodeCode Available	1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
A Review and Efficient Implementation of Scene Graph Generation Metrics	Apr 15, 2024	BenchmarkingGraph Generation	CodeCode Available	1
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1
Comprehensive benchmarking of large language models for RNA secondary structure prediction	Oct 21, 2024	Benchmarking	CodeCode Available	1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 111 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified