SOTAVerified

Benchmarking

Papers

Showing 39514000 of 5548 papers

TitleStatusHype
Simulation of Large Scale Neural Networks for Evaluation Applications0
SinaTools: Open Source Toolkit for Arabic Natural Language Processing0
SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study0
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data0
Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Management0
Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites0
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation0
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping0
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra0
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback0
Skills and Liquidity Barriers to Youth Employment: Medium-term Evidence from a Cash Benchmarking Experiment in Rwanda0
SkyRover: A Modular Simulator for Cross-Domain Pathfinding0
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation0
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI0
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge0
SMPLy Benchmarking 3D Human Pose Estimation in the Wild0
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos0
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents0
Social Bias Probing: Fairness Benchmarking for Language Models0
Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities0
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns0
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection0
Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal0
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents0
SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework0
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates0
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series0
Solver Scheduling via Answer Set Programming0
Solving the chemical master equation for monomolecular reaction systems analytically: a Doi-Peliti path integral view0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents0
SortBench: Benchmarking LLMs based on their ability to sort lists0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
So you think you can track?0
SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain0
Sparse Deep Nonnegative Matrix Factorization0
Sparse Representation-Based Classification: Orthogonal Least Squares or Orthogonal Matching Pursuit?0
Spatially Binned ROC: A Comprehensive Saliency Metric0
Spatially Correlated Patterns in Adversarial Images0
Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting0
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues0
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration0
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads0
SpeechVerse: A Large-scale Generalizable Audio Language Model0
Speed Benchmarking of Genetic Programming Frameworks0
SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems0
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration0
Show:102550
← PrevPage 80 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified