SOTAVerified

Benchmarking

Papers

Showing 20712080 of 5548 papers

TitleStatusHype
SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-ResolutionCode1
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases0
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets0
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective TasksCode3
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video GenerationCode1
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial ObservationsCode0
MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents0
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark FrameworkCode1
It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives0
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Show:102550
← PrevPage 208 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified