SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4421–4430 of 5548 papers

Title	Date	Tasks	Status	Hype
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	0
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines	Jun 20, 2024	BenchmarkingDecision Making	CodeCode Available	0
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments	Apr 16, 2025	BenchmarkingCausal Inference	CodeCode Available	0
Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy	Aug 9, 2024	BenchmarkingMedical Image Analysis	CodeCode Available	0
Language-based Image Colorization: A Benchmark and Beyond	Mar 19, 2025	BenchmarkingColorization	CodeCode Available	0
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models	Apr 29, 2025	BenchmarkingDataset Generation	CodeCode Available	0
BenchENAS: A Benchmarking Platform for Evolutionary Neural Architecture Search	Dec 1, 2022	BenchmarkingGPU	CodeCode Available	0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	0
TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language	Dec 23, 2021	BenchmarkingClustering	CodeCode Available	0
Can Tree Based Approaches Surpass Deep Learning in Anomaly Detection? A Benchmarking Study	Feb 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 443 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified