SOTAVerified

Benchmarking

Papers

Showing 39764000 of 5548 papers

TitleStatusHype
On the Use of Quality Diversity Algorithms for The Traveling Thief Problem0
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds0
On the Value of ML Models0
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation0
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking0
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers0
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing0
Benchmarking and Performance Modelling of MapReduce Communication Pattern0
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification0
Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms0
Open-CD: A Comprehensive Toolbox for Change Detection0
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation0
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI0
Open Datasets for Satellite Radio Resource Control0
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors0
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation0
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models0
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets0
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
Show:102550
← PrevPage 160 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified