SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 5548 papers

Title	Date	Tasks	Status	Hype
Advancing LLM Reasoning Generalists with Preference Trees	Apr 2, 2024	BenchmarkingCode Generation	CodeCode Available	3
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework	Mar 19, 2024	BenchmarkingFinancial Analysis	CodeCode Available	3
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection	Mar 19, 2024	Anomaly DetectionBenchmarking	CodeCode Available	3
Recurrent Drafter for Fast Speculative Decoding in Large Language Models	Mar 14, 2024	BenchmarkingKnowledge Distillation	CodeCode Available	3
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries	Jan 27, 2024	BenchmarkingRAG	CodeCode Available	3
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents	Jan 24, 2024	Benchmarking	CodeCode Available	3
Benchmarking LLMs via Uncertainty Quantification	Jan 23, 2024	BenchmarkingUncertainty Quantification	CodeCode Available	3
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation	Jan 22, 2024	BenchmarkingDiagnostic	CodeCode Available	3
SEED-Bench: Benchmarking Multimodal Large Language Models	Jan 1, 2024	BenchmarkingImage Generation	CodeCode Available	3
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One	Dec 10, 2023	AllBenchmarking	CodeCode Available	3

Show:10 25 50

← PrevPage 12 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified