SOTAVerified

Benchmarking

Papers

Showing 47514800 of 5548 papers

TitleStatusHype
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection0
An approach for benchmarking the numerical solutions of stochastic compartmental models0
Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal0
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations0
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents0
An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space0
SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework0
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates0
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series0
Solver Scheduling via Answer Set Programming0
Solving the chemical master equation for monomolecular reaction systems analytically: a Doi-Peliti path integral view0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents0
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset0
SortBench: Benchmarking LLMs based on their ability to sort lists0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data0
So you think you can track?0
An Analysis of Model Robustness across Concurrent Distribution Shifts0
SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain0
An Analysis of Control Parameters of MOEA/D Under Two Different Optimization Scenarios0
Sparse Deep Nonnegative Matrix Factorization0
Sparse Representation-Based Classification: Orthogonal Least Squares or Orthogonal Matching Pursuit?0
Spatially Binned ROC: A Comprehensive Saliency Metric0
Spatially Correlated Patterns in Adversarial Images0
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time0
Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms0
Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting0
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos0
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues0
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration0
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability0
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations0
ABSA-Bench: Towards the Unified Evaluation of Aspect-based Sentiment Analysis Research0
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads0
SpeechVerse: A Large-scale Generalizable Audio Language Model0
Speed Benchmarking of Genetic Programming Frameworks0
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic0
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view0
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite0
SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems0
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration0
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields0
Analysis of different disparity estimation techniques on aerial stereo image datasets0
Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations0
SpiralMLP: A Lightweight Vision MLP Architecture0
ABOUT ML: Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles0
SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs0
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video0
Show:102550
← PrevPage 96 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified