SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3391–3400 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables	Jun 13, 2025	BenchmarkingDescriptive	—Unverified	0	0
Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015	May 1, 2016	Benchmarking	—Unverified	0	0
LLM-initialized Differentiable Causal Discovery	Oct 28, 2024	BenchmarkingCausal Discovery	—Unverified	0	0
Totally Corrective Boosting with Cardinality Penalization	Apr 7, 2015	BenchmarkingCombinatorial Optimization	—Unverified	0	0
Benchmarking Multi-Domain Active Learning on Image Classification	Dec 1, 2023	Active LearningAll	—Unverified	0	0
LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Feb 18, 2025	BenchmarkingText Generation	—Unverified	0	0
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study	Sep 13, 2024	BenchmarkingGrapheme-to-Phoneme Conversion	—Unverified	0	0
Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming	Dec 21, 2023	Benchmarkingreinforcement-learning	—Unverified	0	0
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms	Jan 1, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified	0	0
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection	Oct 29, 2023	BenchmarkingDiversity	—Unverified	0	0

Show:10 25 50

← PrevPage 340 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified