SOTAVerified

Benchmarking

Papers

Showing 13611370 of 5548 papers

TitleStatusHype
TDDBench: A Benchmark for Training data detection0
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive PrototypingCode2
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning AgentCode3
On the Loss of Context-awareness in General Instruction Fine-tuningCode0
Benchmarking Vision, Language, & Action Models on Robotic Learning TasksCode1
Benchmarking XAI Explanations with Human-Aligned Evaluations0
Imagining and building wise machines: The centrality of AI metacognition0
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph GenerationCode1
TableGPT2: A Large Multimodal Model with Tabular Data IntegrationCode4
SinaTools: Open Source Toolkit for Arabic Natural Language Processing0
Show:102550
← PrevPage 137 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified