Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4951–4975 of 5548 papers

Title	Date	Tasks	Status
Benchmarking of Query Strategies: Towards Future Deep Active Learning	Dec 10, 2023	Active LearningBenchmarking	CodeCode Available
Semi-Supervised Learning for Anomaly Traffic Detection via Bidirectional Normalizing Flows	Mar 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks	Mar 15, 2019	BenchmarkingCitation Recommendation	CodeCode Available
Named Clinical Entity Recognition Benchmark	Oct 7, 2024	BenchmarkingDecoder	CodeCode Available
EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models	May 2, 2025	Benchmarking	CodeCode Available
Evaluating the Transferability of Machine-Learned Force Fields for Material Property Modeling	Jan 10, 2023	BenchmarkingGraph Neural Network	CodeCode Available
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Feb 10, 2025	Benchmarking	CodeCode Available
Evaluating the Robustness of Deep Reinforcement Learning for Autonomous Policies in a Multi-agent Urban Driving Environment	Dec 22, 2021	Autonomous DrivingBenchmarking	CodeCode Available
Watts: Infrastructure for Open-Ended Learning	Apr 28, 2022	Benchmarking	CodeCode Available
Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks	Jul 2, 2024	Activity PredictionAnomaly Detection	CodeCode Available
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems	Jun 25, 2024	BenchmarkingCollaborative Filtering	CodeCode Available
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification	May 23, 2025	BenchmarkingClassification	CodeCode Available
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses	May 19, 2023	BenchmarkingForm	CodeCode Available
Unsupervised Novelty Detection Methods Benchmarking with Wavelet Decomposition	Sep 11, 2024	BenchmarkingNovelty Detection	CodeCode Available
Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber Security	Oct 8, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Transparent and Scrutable Recommendations Using Natural Language User Profiles	Feb 8, 2024	BenchmarkingDescriptive	CodeCode Available
SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations	Jul 8, 2025	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available
SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing	Oct 14, 2024	BenchmarkingManagement	CodeCode Available
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction	Nov 8, 2023	BenchmarkingClick-Through Rate Prediction	CodeCode Available
Navigating Out-of-Distribution Electricity Load Forecasting during COVID-19: Benchmarking energy load forecasting models without and with continual learning	Sep 8, 2023	BenchmarkingContinual Learning	CodeCode Available
Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles	Jan 15, 2025	Benchmarking	CodeCode Available
NbBench: Benchmarking Language Models for Comprehensive Nanobody Tasks	May 4, 2025	BenchmarkingRepresentation Learning	CodeCode Available
NCAdapt: Dynamic adaptation with domain-specific Neural Cellular Automata for continual hippocampus segmentation	Oct 30, 2024	BenchmarkingContinual Learning	CodeCode Available
A Systematic Review of Green AI	Jan 26, 2023	Benchmarking	CodeCode Available
Evaluating LLP Methods: Challenges and Approaches	Oct 29, 2023	BenchmarkingModel Selection	CodeCode Available

Show:10 25 50

← PrevPage 199 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified