SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2251–2260 of 5548 papers

Title	Date	Tasks	Status	Hype
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing	Feb 14, 2025	BenchmarkingRAG	CodeCode Available	0
Standardisation of Convex Ultrasound Data Through Geometric Analysis and Augmentation	Feb 13, 2025	Benchmarking	—Unverified	0
AT-Drone: Benchmarking Adaptive Teaming in Multi-Drone Pursuit	Feb 13, 2025	BenchmarkingEdge-computing	—Unverified	0
Machine learning for modelling unstructured grid data in computational physics: a review	Feb 13, 2025	Benchmarking	—Unverified	0
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis	Feb 13, 2025	Benchmarking	—Unverified	0
SkyRover: A Modular Simulator for Cross-Domain Pathfinding	Feb 13, 2025	Benchmarking	—Unverified	0
A Survey on LLM-based News Recommender Systems	Feb 13, 2025	BenchmarkingFairness	—Unverified	0
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Feb 13, 2025	Benchmarking	—Unverified	0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Feb 13, 2025	BenchmarkingMath	—Unverified	0
Zero-shot generation of synthetic neurosurgical data with large language models	Feb 13, 2025	BenchmarkingDe-identification	CodeCode Available	0

Show:10 25 50

← PrevPage 226 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified