Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–850 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness	Mar 24, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	1	5
Benchmarks for Deep Off-Policy Evaluation	Mar 30, 2021	Benchmarkingcontinuous-control	CodeCode Available	1	5
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions	May 27, 2022	BenchmarkingFew-Shot Image Classification	CodeCode Available	1	5
Recent Advances on Neural Network Pruning at Initialization	Mar 11, 2021	BenchmarkingNetwork Pruning	CodeCode Available	1	5
Boosting Neural Image Compression for Machines Using Latent Space Masking	Dec 15, 2021	BenchmarkingImage Compression	CodeCode Available	1	5
Enhancing Biomedical Relation Extraction with Directionality	Jan 23, 2025	BenchmarkingDocument-level Relation Extraction	CodeCode Available	1	5
Benchmarking Algorithms for Federated Domain Generalization	Jul 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler	Feb 2, 2023	BenchmarkingEvolutionary Algorithms	CodeCode Available	1	5
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text	Apr 28, 2025	Benchmarking	CodeCode Available	1	5
Evaluating Adversarial Attacks on ImageNet: A Reality Check on Misclassification Classes	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
Federated Learning Under Intermittent Client Availability and Time-Varying Communication Constraints	May 13, 2022	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images	Dec 8, 2023	BenchmarkingObject	CodeCode Available	1	5
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase	Jun 21, 2023	3D-Aware Image SynthesisBenchmarking	CodeCode Available	1	5
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms	Sep 21, 2022	3D human pose and shape estimationBenchmarking	CodeCode Available	1	5
A Benchmarking Study of Kolmogorov-Arnold Networks on Tabular Data	Jun 20, 2024	BenchmarkingKolmogorov-Arnold Networks	CodeCode Available	1	5
Benchmarking and scaling of deep learning models for land cover image classification	Nov 18, 2021	BenchmarkingClassification	CodeCode Available	1	5
Benchmarking and Analyzing Point Cloud Classification under Corruptions	Feb 7, 2022	BenchmarkingClassification	CodeCode Available	1	5
Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples	Jul 31, 2023	Adversarial RobustnessBenchmarking	CodeCode Available	1	5
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1	5
Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding	Jul 17, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1	5
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1	5
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models	Dec 21, 2023	Benchmarking	CodeCode Available	1	5
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1	5
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios	May 22, 2025	Benchmarking	CodeCode Available	1	5
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling	Oct 14, 2022	BenchmarkingLanguage Modeling	CodeCode Available	1	5
ByzFL: Research Framework for Robust Federated Learning	May 30, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Meta-embeddings: What Works and What Does Not	Nov 1, 2021	BenchmarkingEmbeddings Evaluation	CodeCode Available	1	5
EgoNormia: Benchmarking Physical Social Norm Understanding	Feb 27, 2025	Answer GenerationBenchmarking	CodeCode Available	1	5
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning	Jan 15, 2021	BenchmarkingMisinformation	CodeCode Available	1	5
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1	5
Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk	Jul 2, 2022	BenchmarkingMachine Translation	CodeCode Available	1	5
Benchmarking machine learning models on multi-centre eICU critical care dataset	Oct 2, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym	Dec 6, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection	Mar 12, 2025	BenchmarkingCode Classification	CodeCode Available	1	5
Improving and Benchmarking Offline Reinforcement Learning Algorithms	Jun 1, 2023	AttributeBenchmarking	CodeCode Available	1	5
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds	Apr 25, 2023	BenchmarkingPose Estimation	CodeCode Available	1	5
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking and Survey of Explanation Methods for Black Box Models	Feb 25, 2021	BenchmarkingSurvey	CodeCode Available	1	5
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders	Jun 4, 2024	BenchmarkingClustering	CodeCode Available	1	5
ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning	Feb 8, 2022	BenchmarkingLanguage Modelling	CodeCode Available	1	5
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1	5
CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark	Jun 5, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Meaning Representations in Neural Semantic Parsing	Nov 1, 2020	BenchmarkingSemantic Parsing	CodeCode Available	1	5

Show:10 25 50

← PrevPage 17 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified