SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1801–1810 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking MOEAs for solving continuous multi-objective RL problems	May 19, 2025	BenchmarkingEvolutionary Algorithms	CodeCode Available	0
SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference	May 19, 2025	BenchmarkingEEG	—Unverified	0
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity	May 19, 2025	Benchmarkingfeature selection	CodeCode Available	0
Disambiguation in Conversational Question Answering in the Era of LLM: A Survey	May 18, 2025	BenchmarkingConversational Question Answering	—Unverified	0
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models	May 18, 2025	ArticlesBenchmarking	—Unverified	0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind	May 18, 2025	BenchmarkingScene Understanding	—Unverified	0
CompBench: Benchmarking Complex Instruction-guided Image Editing	May 18, 2025	BenchmarkingInstruction Following	—Unverified	0
OSS-Bench: Benchmark Generator for Coding LLMs	May 18, 2025	Benchmarking	CodeCode Available	0
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation	May 17, 2025	Benchmarking	—Unverified	0
SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds	May 17, 2025	BenchmarkingBinary Classification	CodeCode Available	0

Show:10 25 50

← PrevPage 181 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified