SOTAVerified

Benchmarking

Papers

Showing 49514975 of 5548 papers

TitleStatusHype
Benchmarking of Query Strategies: Towards Future Deep Active LearningCode0
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing FlowsCode0
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional NetworksCode0
Named Clinical Entity Recognition BenchmarkCode0
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP ModelsCode0
Evaluating the Transferability of Machine-Learned Force Fields for Material Property ModelingCode0
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph ColoringCode0
Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving EnvironmentCode0
Watts: Infrastructure for Open-Ended LearningCode0
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining TasksCode0
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender SystemsCode0
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond ClassificationCode0
Separating form and meaning: Using self-consistency to quantify task understanding across multiple sensesCode0
Unsupervised Novelty Detection Methods Benchmarking with Wavelet DecompositionCode0
Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber SecurityCode0
Transparent and Scrutable Recommendations Using Natural Language User ProfilesCode0
SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor VariationsCode0
SensorBench: Benchmarking LLMs in Coding-Based Sensor ProcessingCode0
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR PredictionCode0
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learningCode0
Evaluating SAT and SMT Solvers on Large-Scale Sudoku PuzzlesCode0
NbBench: Benchmarking Language Models for Comprehensive Nanobody TasksCode0
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentationCode0
A Systematic Review of Green AICode0
Evaluating LLP Methods: Challenges and ApproachesCode0
Show:102550
← PrevPage 199 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified