SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–660 of 5548 papers

Title	Date	Tasks	Status	Hype
Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers	Apr 2, 2025	BenchmarkingManagement	—Unverified	0
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks	Apr 2, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Horizon Scans can be accelerated using novel information retrieval and artificial intelligence tools	Apr 2, 2025	Active LearningArticles	—Unverified	0
FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking	Apr 2, 2025	3D Scene ReconstructionBenchmarking	—Unverified	0
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing	Apr 2, 2025	3D ReconstructionBenchmarking	CodeCode Available	1
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions	Apr 2, 2025	BenchmarkingSegmentation	—Unverified	0
Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries	Apr 2, 2025	BenchmarkingComputational Efficiency	—Unverified	0
Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework	Apr 2, 2025	BenchmarkingSynthetic Data Generation	CodeCode Available	2
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models	Apr 1, 2025	Benchmarking	—Unverified	0
TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images	Apr 1, 2025	Autonomous NavigationBenchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 66 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified