SOTAVerified

Benchmarking

Papers

Showing 39514000 of 5548 papers

TitleStatusHype
Decisions and Performance Under Bounded Rationality: A Computational Benchmarking Approach0
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share0
What Will it Take to Fix Benchmarking in Natural Language Understanding?0
Transformed Subspace Clustering0
On the Evaluation of Speech Foundation Models for Spoken Language Understanding0
On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel0
Transformers in Protein: A Survey0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks0
Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes0
On the Interaction of Belief Bias and Explanations0
Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark0
On the Performance of Multimodal Language Models0
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks0
On the project risk baseline: integrating aleatory uncertainty into project scheduling0
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild0
On the reduction of Linear Parameter-Varying State-Space models0
On the relationship between Benchmarking, Standards and Certification in Robotics and AI0
On the Reliability and Validity of Detecting Approval of Political Actors in Tweets0
On the Robustness of Human-Object Interaction Detection against Distribution Shift0
On the role of benchmarking data sets and simulations in method comparison studies0
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends0
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning0
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems0
On the Use of Quality Diversity Algorithms for The Traveling Thief Problem0
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds0
On the Value of ML Models0
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation0
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images0
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations0
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics0
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking0
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers0
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing0
Benchmarking and Performance Modelling of MapReduce Communication Pattern0
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification0
Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms0
Open-CD: A Comprehensive Toolbox for Change Detection0
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation0
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI0
Open Datasets for Satellite Radio Resource Control0
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors0
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation0
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models0
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets0
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion0
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety0
Benchmarking and Improving Generator-Validator Consistency of Language Models0
Show:102550
← PrevPage 80 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified