SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2291–2300 of 5548 papers

Title	Date	Tasks	Status	Hype
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures	Jan 1, 2024	BenchmarkingInstance Segmentation	—Unverified	0
Better Practices for Domain Adaptation	Sep 7, 2023	BenchmarkingDomain Adaptation	—Unverified	0
Barkour: Benchmarking Animal-level Agility with Quadruped Robots	May 24, 2023	BenchmarkingNavigate	—Unverified	0
Active Evaluation Acquisition for Efficient LLM Benchmarking	Oct 8, 2024	Benchmarking	—Unverified	0
AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering	Jun 3, 2025	Benchmarking	—Unverified	0
FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding	Nov 16, 2021	BenchmarkingNatural Language Understanding	—Unverified	0
Few-Shot Defect Segmentation Leveraging Abundant Normal Training Samples Through Normal Background Regularization and Crop-and-Paste Operation	Jul 18, 2020	Anomaly DetectionBenchmarking	—Unverified	0
Better Bill GPT: Comparing Large Language Models against Legal Invoice Reviewers	Apr 2, 2025	BenchmarkingManagement	—Unverified	0
BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures	Jun 6, 2025	BenchmarkingCPU	—Unverified	0
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali	Oct 16, 2023	BenchmarkingData Augmentation	—Unverified	0

Show:10 25 50

← PrevPage 230 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified