SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1121–1130 of 5548 papers

Title	Date	Tasks	Status	Hype
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools	Jan 1, 2025	Benchmarking	—Unverified	0
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions	Jan 1, 2025	Benchmarking	CodeCode Available	0
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark	Jan 1, 2025	BenchmarkingImage Segmentation	CodeCode Available	2
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds	Jan 1, 2025	Benchmarking	—Unverified	0
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries	Dec 31, 2024	BenchmarkingOut-of-Distribution Generalization	—Unverified	0
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning	Dec 31, 2024	BenchmarkingLogical Reasoning	CodeCode Available	4
A review of faithfulness metrics for hallucination assessment in Large Language Models	Dec 31, 2024	BenchmarkingHallucination	—Unverified	0
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects	Dec 31, 2024	BenchmarkingMultiple-choice	—Unverified	0
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing	Dec 30, 2024	BenchmarkingEthics	—Unverified	0
TrajLearn: Trajectory Prediction Learning using Deep Generative Models	Dec 30, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 113 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified