SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2371–2380 of 5548 papers

Title	Date	Tasks	Status	Hype
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices	Jan 7, 2025	BenchmarkingClustering	—Unverified	0
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding	Jan 7, 2025	BenchmarkingCode Generation	—Unverified	0
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models	Jan 6, 2025	BenchmarkingFeature Compression	—Unverified	0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input	Jan 6, 2025	BenchmarkingForm	—Unverified	0
Tougher Text, Smarter Models: Raising the Bar for Adversarial Defence Benchmarks	Jan 5, 2025	Adversarial RobustnessBenchmarking	CodeCode Available	0
ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation	Jan 3, 2025	BenchmarkingCrowd Counting	CodeCode Available	0
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents	Jan 3, 2025	Benchmarking	—Unverified	0
AI-Powered Cow Detection in Complex Farm Environments	Jan 3, 2025	Benchmarking	—Unverified	0
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture	Jan 3, 2025	BenchmarkingQuestion Answering	—Unverified	0
TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer	Jan 2, 2025	BenchmarkingQuantization	—Unverified	0

Show:10 25 50

← PrevPage 238 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified