SOTAVerified

Benchmarking

Papers

Showing 171180 of 5548 papers

TitleStatusHype
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and BenchmarkCode2
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical ReasoningCode2
Medical Hallucinations in Foundation Models and Their Impact on HealthcareCode2
Benchmarking Retrieval-Augmented Generation in Multi-Modal ContextsCode2
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton OperatorsCode2
FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image AnalysisCode2
Fino1: On the Transferability of Reasoning Enhanced LLMs to FinanceCode2
SoK: Benchmarking Poisoning Attacks and Defenses in Federated LearningCode2
Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance EstimationCode2
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language ModelCode2
Show:102550
← PrevPage 18 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified