Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4851–4900 of 5548 papers

Title	Date	Tasks	Status
Mol-MoE: Training Preference-Guided Routers for Molecule Generation	Feb 8, 2025	BenchmarkingDrug Design	CodeCode Available
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks	Jul 17, 2024	Adversarial RobustnessBenchmarking	CodeCode Available
Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand Hygiene	Sep 7, 2021	BenchmarkingFine-Grained Image Recognition	CodeCode Available
Moment Matching for Multi-Source Domain Adaptation	Dec 4, 2018	BenchmarkingDomain Adaptation	CodeCode Available
Benchmarking Robustness to Text-Guided Corruptions	Apr 6, 2023	BenchmarkingData Augmentation	CodeCode Available
Fine-grained Entity Recognition with Reduced False Negatives and Large Type Coverage	Apr 30, 2019	Benchmarking	CodeCode Available
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0	Aug 23, 2023	Benchmarkingregression	CodeCode Available
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data	Sep 24, 2024	BenchmarkingDepth Estimation	CodeCode Available
Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving	Mar 20, 2023	3D Object DetectionAutonomous Driving	CodeCode Available
Scission: Performance-driven and Context-aware Cloud-Edge Distribution of Deep Neural Networks	Aug 8, 2020	BenchmarkingDecision Making	CodeCode Available
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profiles	Mar 13, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming	Jul 17, 2019	Autonomous DrivingBenchmarking	CodeCode Available
Motley: Benchmarking Heterogeneity and Personalization in Federated Learning	Jun 18, 2022	BenchmarkingFairness	CodeCode Available
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning	May 30, 2023	BenchmarkingIn-Context Learning	CodeCode Available
Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease Generalization	Jun 21, 2024	BenchmarkingSegmentation	CodeCode Available
The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA	May 2, 2024	BenchmarkingDrug Discovery	CodeCode Available
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs	May 27, 2025	BenchmarkingQuestion Selection	CodeCode Available
Benchmarking Representation Learning for Natural World Image Collections	Mar 30, 2021	BenchmarkingBinary Classification	CodeCode Available
Benchmarking Reinforcement Learning Algorithms on Real-World Robots	Sep 20, 2018	Benchmarkingcontinuous-control	CodeCode Available
Benchmarking Quantum Reinforcement Learning	Jan 27, 2025	Benchmarkingreinforcement-learning	CodeCode Available
MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization	May 1, 2022	Benchmarkingdialogue summary	CodeCode Available
Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models	Jun 22, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal Healthcare	Apr 15, 2025	BenchmarkingDiagnostic	CodeCode Available
Benchmarking quantum machine learning kernel training for classification tasks	Aug 17, 2024	BenchmarkingQuantum Machine Learning	CodeCode Available
The Saudi Privacy Policy Dataset	Apr 5, 2023	Benchmarking	CodeCode Available
MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation	Jan 9, 2024	BenchmarkingInteractive Segmentation	CodeCode Available
ferret: a Framework for Benchmarking Explainers on Transformers	Aug 2, 2022	BenchmarkingExplainable Artificial Intelligence (XAI)	CodeCode Available
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on Turkish	Sep 13, 2023	BenchmarkingTranslation	CodeCode Available
FEET: A Framework for Evaluating Embedding Techniques	Nov 2, 2024	BenchmarkingRepresentation Learning	CodeCode Available
Benchmarking Probabilistic Deep Learning Methods for License Plate Recognition	Feb 2, 2023	BenchmarkingDeep Learning	CodeCode Available
Unraveling the Capabilities of Language Models in News Summarization	Jan 30, 2025	BenchmarkingFew-Shot Learning	CodeCode Available
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Jun 26, 2025	Anomaly DetectionBenchmarking	CodeCode Available
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks	Apr 18, 2021	BenchmarkingFederated Learning	CodeCode Available
MUBen: Benchmarking the Uncertainty of Molecular Representation Models	Jun 14, 2023	BenchmarkingDrug Discovery	CodeCode Available
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection	Sep 17, 2024	BenchmarkingEvent Detection	CodeCode Available
WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection	Mar 13, 2020	Abuse DetectionBenchmarking	CodeCode Available
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs	Jun 8, 2023	BenchmarkingFederated Learning	CodeCode Available
Fedivertex: a Graph Dataset based on Decentralized Social Networks for Trustworthy Machine Learning	May 27, 2025	Benchmarking	CodeCode Available
Feature interpretability in BCIs: exploring the role of network lateralization	Jul 16, 2024	BenchmarkingEEG	CodeCode Available
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?	Oct 28, 2024	BenchmarkingQuestion Answering	CodeCode Available
Benchmarking pre-trained text embedding models in aligning built asset information	Nov 18, 2024	Asset ManagementBenchmarking	CodeCode Available
Benchmarking Pre-trained Language Models for Multilingual NER: TraSpaS at the BSNLP2021 Shared Task	Apr 1, 2021	BenchmarkingLanguage Modeling	CodeCode Available
Feature embedding in click-through rate prediction	Sep 20, 2022	BenchmarkingClick-Through Rate Prediction	CodeCode Available
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks	Jun 16, 2023	Benchmarking	CodeCode Available
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback	Oct 12, 2024	Benchmarking	CodeCode Available
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis	Feb 18, 2025	BenchmarkingMamba	CodeCode Available
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval	Nov 3, 2023	BenchmarkingFairness	CodeCode Available
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements	Dec 4, 2020	BenchmarkingLip password classification	CodeCode Available
Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection Models	Oct 12, 2024	BenchmarkingMisinformation	CodeCode Available
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting	Aug 27, 2024	BenchmarkingDecoder	CodeCode Available

Show:10 25 50

← PrevPage 98 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified