SOTAVerified

Benchmarking

Papers

Showing 30263050 of 5548 papers

TitleStatusHype
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling0
Entity Alignment For Knowledge Graphs: Progress, Challenges, and Empirical Studies0
Entity Personalized Talent Search Models with Tree Interaction Features0
Entropic one-class classifiers0
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models0
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming0
EnvSDD: Benchmarking Environmental Sound Deepfake Detection0
EnzChemRED, a rich enzyme chemistry relation extraction dataset0
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking0
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection0
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit0
Establishing Reliability Metrics for Reward Models in Large Language Models0
Estimating Task Completion Times for Network Rollouts using Statistical Models within Partitioning-based Regression Methods0
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices0
Estimating transmission from genetic and epidemiological data: a metric to compare transmission trees0
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding0
Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization0
Evalita-LLM: Benchmarking Large Language Models on Italian0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI0
Evaluating Cultural and Social Awareness of LLM Web Agents0
Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models0
Evaluating Financial Sentiment Analysis with Annotators Instruction Assisted Prompting: Enhancing Contextual Interpretation and Stock Prediction Accuracy0
Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches0
Evaluating Generative Models for Tabular Data: Novel Metrics and Benchmarking0
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study0
Show:102550
← PrevPage 122 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified