Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4951–5000 of 5548 papers

Title	Date	Tasks	Status
Benchmarking of Query Strategies: Towards Future Deep Active Learning	Dec 10, 2023	Active LearningBenchmarking	CodeCode Available
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows	Mar 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks	Mar 15, 2019	BenchmarkingCitation Recommendation	CodeCode Available
Named Clinical Entity Recognition Benchmark	Oct 7, 2024	BenchmarkingDecoder	CodeCode Available
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models	May 2, 2025	Benchmarking	CodeCode Available
Evaluating the Transferability of Machine-Learned Force Fields for Material Property Modeling	Jan 10, 2023	BenchmarkingGraph Neural Network	CodeCode Available
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Feb 10, 2025	Benchmarking	CodeCode Available
Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving Environment	Dec 22, 2021	Autonomous DrivingBenchmarking	CodeCode Available
Watts: Infrastructure for Open-Ended Learning	Apr 28, 2022	Benchmarking	CodeCode Available
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks	Jul 2, 2024	Activity PredictionAnomaly Detection	CodeCode Available
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems	Jun 25, 2024	BenchmarkingCollaborative Filtering	CodeCode Available
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification	May 23, 2025	BenchmarkingClassification	CodeCode Available
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses	May 19, 2023	BenchmarkingForm	CodeCode Available
Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition	Sep 11, 2024	BenchmarkingNovelty Detection	CodeCode Available
Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber Security	Oct 8, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Transparent and Scrutable Recommendations Using Natural Language User Profiles	Feb 8, 2024	BenchmarkingDescriptive	CodeCode Available
SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations	Jul 8, 2025	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available
SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing	Oct 14, 2024	BenchmarkingManagement	CodeCode Available
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction	Nov 8, 2023	BenchmarkingClick-Through Rate Prediction	CodeCode Available
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learning	Sep 8, 2023	BenchmarkingContinual Learning	CodeCode Available
Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles	Jan 15, 2025	Benchmarking	CodeCode Available
NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks	May 4, 2025	BenchmarkingRepresentation Learning	CodeCode Available
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation	Oct 30, 2024	BenchmarkingContinual Learning	CodeCode Available
A Systematic Review of Green AI	Jan 26, 2023	Benchmarking	CodeCode Available
Evaluating LLP Methods: Challenges and Approaches	Oct 29, 2023	BenchmarkingModel Selection	CodeCode Available
Evaluating Feature Attribution Methods in the Image Domain	Feb 22, 2022	Benchmarking	CodeCode Available
NegBio: a high-performance tool for negation and uncertainty detection in radiology reports	Dec 16, 2017	BenchmarkingNegation	CodeCode Available
A Comprehensive Comparison of Multi-Dimensional Image Denoising Methods	Nov 6, 2020	BenchmarkingDenoising	CodeCode Available
NeMig -- A Bilingual News Collection and Knowledge Graph about Migration	Sep 1, 2023	ArticlesBenchmarking	CodeCode Available
NengoDL: Combining deep learning and neuromorphic modelling methods	May 28, 2018	BenchmarkingDeep Learning	CodeCode Available
Evaluating AI Recruitment Sourcing Tools by Human Preference	Apr 3, 2025	Benchmarking	CodeCode Available
EvalAI: Towards Better Evaluation Systems for AI Agents	Feb 10, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Essential guidelines for computational method benchmarking	Dec 3, 2018	Benchmarking	CodeCode Available
Benchmarking of LSTM Networks	Aug 11, 2015	Benchmarking	CodeCode Available
NerveNet: Learning Structured Policy with Graph Neural Networks	Jan 1, 2018	Benchmarkingcontinuous-control	CodeCode Available
How Fragile is Relation Extraction under Entity Replacements?	May 22, 2023	BenchmarkingCausal Inference	CodeCode Available
Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?	Feb 25, 2020	BenchmarkingLink Prediction	CodeCode Available
Sequence-Aware Recommender Systems	Feb 23, 2018	BenchmarkingMatrix Completion	CodeCode Available
WCEbleedGen: A wireless capsule endoscopy dataset and its benchmarking for automatic bleeding classification, detection, and segmentation	Aug 22, 2024	BenchmarkingClassification	CodeCode Available
Enterprise Benchmarks for Large Language Model Evaluation	Oct 11, 2024	BenchmarkingLanguage Model Evaluation	CodeCode Available
Enriching Social Science Research via Survey Item Linking	Dec 20, 2024	BenchmarkingEntity Disambiguation	CodeCode Available
Sequential Large Language Model-Based Hyper-parameter Optimization	Oct 27, 2024	Bayesian OptimizationBenchmarking	CodeCode Available
Neural Network Design: Learning from Neural Architecture Search	Nov 1, 2020	Benchmarkingimage-classification	CodeCode Available
Benchmarking of image registration methods for differently stained histological slides	Oct 11, 2018	BenchmarkingBIRL	CodeCode Available
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs	Jun 21, 2022	Anomaly DetectionBenchmarking	CodeCode Available
Enhancing Video Summarization with Context Awareness	Apr 6, 2024	BenchmarkingInformativeness	CodeCode Available
Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective	May 8, 2025	Active LearningBenchmarking	CodeCode Available
Benchmarking Neural Machine Translation for Southern African Languages	Jun 17, 2019	BenchmarkingMachine Translation	CodeCode Available
Enhancing Hyper-To-Real Space Projections Through Euclidean Norm Meta-Heuristic Optimization	Jan 31, 2023	Benchmarking	CodeCode Available
Enhancing Biomedical Knowledge Discovery for Diseases: An Open-Source Framework Applied on Rett Syndrome and Alzheimer's Disease	Jul 18, 2024	Benchmarking	CodeCode Available

Show:10 25 50

← PrevPage 100 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified