SOTAVerified

Benchmarking

Papers

Showing 331340 of 5548 papers

TitleStatusHype
Customizable Perturbation Synthesis for Robust SLAM BenchmarkingCode2
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image RetrievalCode2
DreamBench++: A Human-Aligned Benchmark for Personalized Image GenerationCode2
K-LITE: Learning Transferable Visual Models with External KnowledgeCode2
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation GenerationCode2
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence ActCode2
Commit0: Library Generation from ScratchCode2
CoqPilot, a plugin for LLM-based generation of proofsCode2
Benchmarking Benchmark Leakage in Large Language ModelsCode2
Craftium: An Extensible Framework for Creating Reinforcement Learning EnvironmentsCode2
Show:102550
← PrevPage 34 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified