SOTAVerified

Benchmarking

Papers

Showing 24512500 of 5548 papers

TitleStatusHype
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models0
Benchmarking symbolic regression constant optimization schemes0
Benchmarking Surrogate-Assisted Genetic Recommender Systems0
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values0
A large-scale, physically-based synthetic dataset for satellite pose estimation0
Benchmarking Super-Resolution Algorithms on Real Data0
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models0
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects0
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach0
Benchmarking Sub-Genre Classification For Mainstage Dance Music0
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning0
Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations0
Geometry-Based Next Frame Prediction from Monocular Video0
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals0
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms0
Variational Laplace for Bayesian neural networks0
Benchmarking state-of-the-art gradient boosting algorithms for classification0
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture0
Benchmarking State-of-the-Art Deep Learning Software Tools0
A Large-Scale Evaluation of Speech Foundation Models0
Benchmarking Spiking Neural Network Learning Methods with Varying Locality0
A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images0
A2Perf: Real-World Autonomous Agents Benchmark0
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based Retinas0
Benchmarking sparse system identification with low-dimensional chaos0
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus0
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain0
Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies0
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking0
Benchmarking Single-Image Reflection Removal Algorithms0
A tutorial on multi-view autoencoders using the multi-view-AE library0
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking0
Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms0
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking0
Benchmarking Shadow Removal for Facial Landmark Detection and Beyond0
A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs0
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition0
GenSpace: Benchmarking Spatially-Aware Image Generation0
A Large-Scale Analysis on Self-Supervised Video Representation Learning0
A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior0
On the Evaluation of Engineering Artificial General Intelligence0
Genicious: Contextual Few-shot Prompting for Insights Discovery0
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks0
Benchmarking Scientific Image Forgery Detectors0
Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam0
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning0
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking0
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models0
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models0
GeoGebra Tools with Proof Capabilities0
Show:102550
← PrevPage 50 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified