SOTAVerified

Benchmarking

Papers

Showing 15211530 of 5548 papers

TitleStatusHype
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision MakingCode3
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes0
Active Evaluation Acquisition for Efficient LLM Benchmarking0
Manual Verbalizer Enrichment for Few-Shot Text Classification0
Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person PerspectiveCode1
QGym: Scalable Simulation and Benchmarking of Queuing Network ControllersCode0
FedGraph: A Research Library and Benchmark for Federated Graph LearningCode2
Benchmarking of a new data splitting method on volcanic eruption data0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems0
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the WildCode1
Show:102550
← PrevPage 153 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified