SOTAVerified

Benchmarking

Papers

Showing 111120 of 5548 papers

TitleStatusHype
Advancing LLM Reasoning Generalists with Preference TreesCode3
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain FrameworkCode3
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly DetectionCode3
Recurrent Drafter for Fast Speculative Decoding in Large Language ModelsCode3
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop QueriesCode3
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM AgentsCode3
Benchmarking LLMs via Uncertainty QuantificationCode3
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray InterpretationCode3
SEED-Bench: Benchmarking Multimodal Large Language ModelsCode3
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into OneCode3
Show:102550
← PrevPage 12 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified