SOTAVerified

Benchmarking

Papers

Showing 111120 of 5548 papers

TitleStatusHype
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic EnvironmentsCode2
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models0
Attention, Please! Revisiting Attentive Probing for Masked Image ModelingCode1
A Manually Annotated Image-Caption Dataset for Detecting Children in the WildCode0
GRAIL: A Benchmark for GRaph ActIve Learning in Dynamic Sensing Environments0
Graph Attention-based Decentralized Actor-Critic for Dual-Objective Control of Multi-UAV Swarms0
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell DataCode1
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmark of Large Language Models in Mental Health CounselingCode1
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP0
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens0
Show:102550
← PrevPage 12 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified