SOTAVerified

Benchmarking

Papers

Showing 15011510 of 5548 papers

TitleStatusHype
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action RecognitionCode0
DCR: Quantifying Data Contamination in LLMs EvaluationCode0
A Multi-View High-Resolution Foot-Ankle Complex Point Cloud Dataset During Gait for Occlusion-Robust 3D Completion0
FLsim: A Modular and Library-Agnostic Simulation Framework for Federated LearningCode0
Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop0
MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking0
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks0
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance0
Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language ModelsCode0
Show:102550
← PrevPage 151 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified