SOTAVerified

Benchmarking

Papers

Showing 30013050 of 5548 papers

TitleStatusHype
Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI0
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler0
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices0
Enabling Accelerators for Graph Computing0
Automated Machine Learning: A Case Study on Non-Intrusive Appliance Load Monitoring0
Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design0
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting0
1-D Convlutional Neural Networks for the Analysis of Pupil Size Variations in Scotopic Conditions0
End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials0
Energy Management in Storage-Augmented, Grid-Connected Prosumer Buildings and Neighbourhoods Using a Modified Simulated Annealing Optimization0
Enhanced Multiobjective Evolutionary Algorithm based on Decomposition for Solving the Unit Commitment Problem0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Enhancing Explainability and Reliable Decision-Making in Particle Swarm Optimization through Communication Topologies0
Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures0
Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement0
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages0
Enhancing Navigation Benchmarking and Perception Data Generation for Row-based Crops in Simulation0
Enhancing Post-Hoc Explanation Benchmark Reliability for Image Classification0
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG0
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries0
Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations0
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs0
EnronQA: Towards Personalized RAG over Private Documents0
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling0
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies0
Entity Personalized Talent Search Models with Tree Interaction Features0
Entropic one-class classifiers0
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models0
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming0
EnvSDD: Benchmarking Environmental Sound Deepfake Detection0
EnzChemRED, a rich enzyme chemistry relation extraction dataset0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking0
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection0
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit0
Establishing Reliability Metrics for Reward Models in Large Language Models0
Estimating Task Completion Times for Network Rollouts using Statistical Models within Partitioning-based Regression Methods0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
Estimating transmission from genetic and epidemiological data: a metric to compare transmission trees0
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding0
Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization0
Evalita-LLM: Benchmarking Large Language Models on Italian0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI0
Evaluating Cultural and Social Awareness of LLM Web Agents0
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models0
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy0
Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking0
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study0
Show:102550
← PrevPage 61 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified