SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2501–2510 of 5548 papers

Title	Date	Tasks	Status	Hype
Personalized Multimodal Large Language Models: A Survey	Dec 3, 2024	BenchmarkingSurvey	—Unverified	0
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data	Dec 3, 2024	Benchmarking	—Unverified	0
BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts	Dec 3, 2024	Age And Gender ClassificationAge and Gender Estimation	CodeCode Available	0
Benchmarking symbolic regression constant optimization schemes	Dec 3, 2024	Benchmarkingregression	—Unverified	0
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations	Dec 3, 2024	BenchmarkingFace Recognition	—Unverified	0
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods	Dec 3, 2024	Benchmarking	CodeCode Available	0
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking	Dec 2, 2024	BenchmarkingDecision Making	—Unverified	0
Understanding the World's Museums through Vision-Language Reasoning	Dec 2, 2024	BenchmarkingQuestion Answering	CodeCode Available	0
AI Benchmarks and Datasets for LLM Evaluation	Dec 2, 2024	BenchmarkingDistributed Computing	—Unverified	0
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)	Dec 2, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available	0

Show:10 25 50

← PrevPage 251 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified