SOTAVerified

Benchmarking

Papers

Showing 24112420 of 5548 papers

TitleStatusHype
Benchmarking Generative AI Models for Deep Learning Test Input GenerationCode0
Chumor 2.0: Towards Benchmarking Chinese Humor UnderstandingCode0
Multimodal Deep Reinforcement Learning for Portfolio Optimization0
SCBench: A Sports Commentary Benchmark for Video LLMs0
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs0
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations0
HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device ScenariosCode0
First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher NetworkCode0
Patherea: Cell Detection and Classification for the 2020s0
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource ScriptsCode0
Show:102550
← PrevPage 242 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified