SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 331–340 of 5548 papers

Title	Date	Tasks	Status	Hype
Customizable Perturbation Synthesis for Robust SLAM Benchmarking	Feb 12, 2024	BenchmarkingSimultaneous Localization and Mapping	CodeCode Available	2
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval	May 5, 2024	BenchmarkingComposed Image Retrieval (CoIR)	CodeCode Available	2
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation	Jun 24, 2024	BenchmarkingImage Generation	CodeCode Available	2
K-LITE: Learning Transferable Visual Models with External Knowledge	Apr 20, 2022	BenchmarkingDescriptive	CodeCode Available	2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation	Oct 30, 2024	BenchmarkingPassage Retrieval	CodeCode Available	2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act	Oct 10, 2024	BenchmarkingFairness	CodeCode Available	2
Commit0: Library Generation from Scratch	Dec 2, 2024	BenchmarkingCode Generation	CodeCode Available	2
CoqPilot, a plugin for LLM-based generation of proofs	Oct 25, 2024	Benchmarking	CodeCode Available	2
Benchmarking Benchmark Leakage in Large Language Models	Apr 29, 2024	BenchmarkingMathematical Reasoning	CodeCode Available	2
Craftium: An Extensible Framework for Creating Reinforcement Learning Environments	Jul 4, 2024	BenchmarkingMinecraft	CodeCode Available	2

Show:10 25 50

← PrevPage 34 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified