SOTAVerified

Benchmarking

Papers

Showing 521530 of 5548 papers

TitleStatusHype
Benchmarking LLMs for Political Science: A United Nations PerspectiveCode1
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?Code1
ILIAS: Instance-Level Image retrieval At ScaleCode1
Positional Encoding in Transformer-Based Time Series Models: A SurveyCode1
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic ClaimsCode1
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMsCode1
LOB-Bench: Benchmarking Generative AI for Finance -- an Application to Limit Order Book DataCode1
Foundation Model of Electronic Medical Records for Adaptive Risk EstimationCode1
Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video EnvironmentsCode1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution ShiftsCode1
Show:102550
← PrevPage 53 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified