SOTAVerified

Benchmarking

Papers

Showing 52515300 of 5548 papers

TitleStatusHype
End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials0
Energy Management in Storage-Augmented, Grid-Connected Prosumer Buildings and Neighbourhoods Using a Modified Simulated Annealing Optimization0
T^2K^2: The Twitter Top-K Keywords Benchmark0
Enhanced Multiobjective Evolutionary Algorithm based on Decomposition for Solving the Unit Commitment Problem0
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery0
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT0
Characterizing the adversarial vulnerability of speech self-supervised learning0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies0
Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures0
TabKAN: Advancing Tabular Data Analysis using Kolmogorov-Arnold Network0
Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement0
Characterizing Missing Information in Deep Networks Using Backpropagated Gradients0
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages0
Enhancing Navigation Benchmarking and Perception Data Generation for Row-based Crops in Simulation0
Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification0
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG0
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries0
Characterization of Multiple 3D LiDARs for Localization and Mapping using Normal Distributions Transform0
Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations0
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model0
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs0
TabTreeFormer: Tabular Data Generation Using Hybrid Tree-Transformer0
ALdataset: a benchmark for pool-based active learning0
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective0
EnronQA: Towards Personalized RAG over Private Documents0
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling0
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Feature Space Perspective0
TabularQGAN: A Quantum Generative Model for Tabular Data0
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies0
Entity Personalized Talent Search Models with Tree Interaction Features0
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models0
A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration0
Entropic one-class classifiers0
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models0
Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution0
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming0
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity0
EnvSDD: Benchmarking Environmental Sound Deepfake Detection0
EnzChemRED, a rich enzyme chemistry relation extraction dataset0
Challenges in Benchmarking Stream Learning Algorithms with Real-world Data0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking0
Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking0
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection0
Challenges and perspectives in computational deconvolution of genomics data0
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit0
Tackling the Story Ending Biases in The Story Cloze Test0
Establishing Reliability Metrics for Reward Models in Large Language Models0
Show:102550
← PrevPage 106 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified