SOTAVerified

Benchmarking

Papers

Showing 49515000 of 5548 papers

TitleStatusHype
Benchmarking of Query Strategies: Towards Future Deep Active LearningCode0
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing FlowsCode0
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional NetworksCode0
Named Clinical Entity Recognition BenchmarkCode0
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP ModelsCode0
Evaluating the Transferability of Machine-Learned Force Fields for Material Property ModelingCode0
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph ColoringCode0
Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving EnvironmentCode0
Watts: Infrastructure for Open-Ended LearningCode0
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining TasksCode0
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender SystemsCode0
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond ClassificationCode0
Separating form and meaning: Using self-consistency to quantify task understanding across multiple sensesCode0
Unsupervised Novelty Detection Methods Benchmarking with Wavelet DecompositionCode0
Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber SecurityCode0
Transparent and Scrutable Recommendations Using Natural Language User ProfilesCode0
SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor VariationsCode0
SensorBench: Benchmarking LLMs in Coding-Based Sensor ProcessingCode0
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR PredictionCode0
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learningCode0
Evaluating SAT and SMT Solvers on Large-Scale Sudoku PuzzlesCode0
NbBench: Benchmarking Language Models for Comprehensive Nanobody TasksCode0
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentationCode0
A Systematic Review of Green AICode0
Evaluating LLP Methods: Challenges and ApproachesCode0
Evaluating Feature Attribution Methods in the Image DomainCode0
NegBio: a high-performance tool for negation and uncertainty detection in radiology reportsCode0
A Comprehensive Comparison of Multi-Dimensional Image Denoising MethodsCode0
NeMig -- A Bilingual News Collection and Knowledge Graph about MigrationCode0
NengoDL: Combining deep learning and neuromorphic modelling methodsCode0
Evaluating AI Recruitment Sourcing Tools by Human PreferenceCode0
EvalAI: Towards Better Evaluation Systems for AI AgentsCode0
Essential guidelines for computational method benchmarkingCode0
Benchmarking of LSTM NetworksCode0
NerveNet: Learning Structured Policy with Graph Neural NetworksCode0
How Fragile is Relation Extraction under Entity Replacements?Code0
Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?Code0
Sequence-Aware Recommender SystemsCode0
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentationCode0
Enterprise Benchmarks for Large Language Model EvaluationCode0
Enriching Social Science Research via Survey Item LinkingCode0
Sequential Large Language Model-Based Hyper-parameter OptimizationCode0
Neural Network Design: Learning from Neural Architecture SearchCode0
Benchmarking of image registration methods for differently stained histological slidesCode0
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed GraphsCode0
Enhancing Video Summarization with Context AwarenessCode0
Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering PerspectiveCode0
Benchmarking Neural Machine Translation for Southern African LanguagesCode0
Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic OptimizationCode0
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's DiseaseCode0
Show:102550
← PrevPage 100 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified