SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 761–770 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A multi-schematic classifier-independent oversampling approach for imbalanced datasets	Jul 15, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	Aug 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization	May 27, 2025	Benchmarking	CodeCode Available	1	5
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical Cyclones	Nov 5, 2023	Benchmarking	CodeCode Available	1	5
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models	Dec 5, 2023	BenchmarkingVisual Question Answering	CodeCode Available	1	5
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1	5
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5

Show:10 25 50

← PrevPage 77 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified