SOTAVerified

Benchmarking

Papers

Showing 39513975 of 5548 papers

TitleStatusHype
Simulation of Large Scale Neural Networks for Evaluation Applications0
SinaTools: Open Source Toolkit for Arabic Natural Language Processing0
SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study0
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data0
Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Management0
Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites0
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation0
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping0
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra0
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback0
Skills and Liquidity Barriers to Youth Employment: Medium-term Evidence from a Cash Benchmarking Experiment in Rwanda0
SkyRover: A Modular Simulator for Cross-Domain Pathfinding0
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation0
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI0
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge0
SMPLy Benchmarking 3D Human Pose Estimation in the Wild0
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos0
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents0
Social Bias Probing: Fairness Benchmarking for Language Models0
Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities0
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns0
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection0
Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal0
Show:102550
← PrevPage 159 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified