SOTAVerified

Benchmarking

Papers

Showing 47014750 of 5548 papers

TitleStatusHype
Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction0
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences0
Simple Feedfoward Neural Networks are Almost All You Need for Time Series Forecasting0
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment0
Accelerating IoV Intrusion Detection: Benchmarking GPU-Accelerated vs CPU-Based ML Libraries0
VeriContaminated: Assessing LLM-Driven Verilog Coding for Data Contamination0
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts0
Verifiable Format Control for Large Language Model Generations0
Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions0
Simulation of Large Scale Neural Networks for Evaluation Applications0
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions0
SinaTools: Open Source Toolkit for Arabic Natural Language Processing0
SINDy vs Hard Nonlinearities and Hidden Dynamics: a Benchmarking Study0
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity0
An evaluation framework for comparing causal inference models0
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data0
Single Stage Prediction with Embedded Topic Modeling of Online Reviews for Mobile App Management0
An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models0
Site2Vec: a reference frame invariant algorithm for vector embedding of protein-ligand binding sites0
An Empirical Study of Super-resolution on Low-resolution Micro-expression Recognition0
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models0
An Empirical Study of Benchmarking Chinese Aspect Sentiment Quad Prediction0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation0
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models0
Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping0
Sketch 'n Solve: An Efficient Python Package for Large-Scale Least Squares Using Randomized Numerical Linear Algebra0
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback0
Skills and Liquidity Barriers to Youth Employment: Medium-term Evidence from a Cash Benchmarking Experiment in Rwanda0
SkyRover: A Modular Simulator for Cross-Domain Pathfinding0
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation0
A Case for Dataset Specific Profiling0
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets0
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification0
AN ELIXIR FOR BLOCKCHAIN SCALABILITY WITH CHANNEL BASED CLUSTERED SHARDING0
SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images0
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks0
Smiling Women Pitching Down: Auditing Representational and Presentational Gender Biases in Image Generative AI0
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge0
An efficiency analysis of Spanish airports0
An EEG-based Stereoscopic Research to Reveal the Brain's Response to What Happens Before and After Watching 2D and 3D Movies0
An Early Warning Sign of Critical Transition in The Antarctic Ice Sheet -- A Data Driven Tool for Spatiotemporal Tipping Point0
SMPLy Benchmarking 3D Human Pose Estimation in the Wild0
Absolute Ranking: An Essential Normalization for Benchmarking Optimization Algorithms0
VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution0
SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos0
SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents0
Window-of-interest based Multi-objective Evolutionary Search for Satisficing Concepts0
Social Bias Probing: Fairness Benchmarking for Language Models0
Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities0
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns0
Show:102550
← PrevPage 95 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified