SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 571–580 of 5548 papers

Title	Date	Tasks	Status	Hype
Multi-Agent Environments for Vehicle Routing Problems	Nov 21, 2024	Benchmarkingreinforcement-learning	CodeCode Available	1
StackEval: Benchmarking LLMs in Coding Assistance	Nov 21, 2024	Benchmarking	CodeCode Available	1
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models	Nov 19, 2024	BenchmarkingDeep Learning	CodeCode Available	1
Introducing Milabench: Benchmarking Accelerators for AI	Nov 18, 2024	BenchmarkingDeep Learning	CodeCode Available	1
FM-TS: Flow Matching for Time Series Generation	Nov 12, 2024	BenchmarkingImputation	CodeCode Available	1
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification	Nov 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset	Nov 5, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation	Nov 4, 2024	BenchmarkingGraph Generation	CodeCode Available	1
Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks	Nov 4, 2024	Action GenerationBenchmarking	CodeCode Available	1
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving	Nov 3, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 58 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified