SOTAVerified

Benchmarking

Papers

Showing 12111220 of 5548 papers

TitleStatusHype
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
FedCV: A Federated Learning Framework for Diverse Computer Vision TasksCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless SystemsCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
3DYoga90: A Hierarchical Video Dataset for Yoga Pose UnderstandingCode1
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking SequencesCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarkingCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
Show:102550
← PrevPage 122 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified