Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2501–2550 of 5548 papers

Title	Date	Tasks	Status
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking	Feb 28, 2023	Adversarial RobustnessBenchmarking	—Unverified
Graph-based Deep-Tree Recursive Neural Network (DTRNN) for Text Classification	Sep 4, 2018	BenchmarkingGeneral Classification	—Unverified
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra	Mar 5, 2021	BenchmarkingGraph Mining	—Unverified
Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation	Dec 16, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition	Jan 10, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
7th AI Driving Olympics: 1st Place Report for Panoptic Tracking	Dec 9, 2021	BenchmarkingPanoptic Segmentation	—Unverified
A Theory of Dynamic Benchmarks	Oct 6, 2022	Benchmarking	—Unverified
Variational Laplace for Bayesian neural networks	Nov 20, 2020	BenchmarkingVariational Inference	—Unverified
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games	Aug 28, 2024	Atari GamesBenchmarking	—Unverified
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness	May 5, 2023	BenchmarkingDataset Distillation	—Unverified
GPTs and Language Barrier: A Cross-Lingual Legal QA Examination	Mar 26, 2024	ArticlesBenchmarking	—Unverified
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities	May 13, 2025	automatic-speech-translationBenchmarking	—Unverified
Benchmarking Robustness of Deep Reinforcement Learning approaches to Online Portfolio Management	Jun 19, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation	Mar 2, 2022	BenchmarkingDeep Learning	—Unverified
A tale of two toolkits, report the first: benchmarking time series classification algorithms for correctness and efficiency	Sep 12, 2019	BenchmarkingGeneral Classification	—Unverified
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval	Jan 15, 2025	BenchmarkingContrastive Learning	—Unverified
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities	Jun 6, 2023	BenchmarkingDepth Completion	—Unverified
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models	Jun 17, 2024	BenchmarkingSurvey	—Unverified
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models	Jun 3, 2023	Benchmarking	—Unverified
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals	May 21, 2025	BenchmarkingChatbot	—Unverified
GreenPCO: An Unsupervised Lightweight Point Cloud Odometry Method	Dec 8, 2021	BenchmarkingObject	—Unverified
Granular Change Accuracy: A More Accurate Performance Metric for Dialogue State Tracking	Mar 17, 2024	BenchmarkingDialogue State Tracking	—Unverified
Benchmarking Robustness in Neural Radiance Fields	Jan 10, 2023	BenchmarkingCamera Calibration	—Unverified
A Systematic Evaluation of Domain Adaptation Algorithms On Time Series Data	Sep 29, 2021	BenchmarkingDomain Adaptation	—Unverified
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO	Aug 30, 2023	BenchmarkingReinforcement Learning (RL)	—Unverified
Benchmarking Robot Manipulation with the Rubik's Cube	Feb 14, 2022	BenchmarkingRobot Manipulation	—Unverified
A Comprehensive Multi-Illuminant Dataset for Benchmarking of the Intrinsic Image Algorithms	Dec 1, 2015	BenchmarkingImage Generation	—Unverified
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness	May 13, 2024	Benchmarkingcounterfactual	—Unverified
A Systematic Analysis of Hybrid Linear Attention	Jul 8, 2025	BenchmarkingLanguage Modeling	—Unverified
Benchmarking Retrieval-Augmented Generation for Chemistry	May 12, 2025	BenchmarkingRAG	—Unverified
Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences	Nov 14, 2022	Autonomous DrivingBenchmarking	—Unverified
Airport Capacity and Performance in Europe -- A study of transport economics, service quality and sustainability	Feb 4, 2021	Benchmarking	—Unverified
Benchmarking Resource Usage for Efficient Distributed Deep Learning	Jan 28, 2022	BenchmarkingDeep Learning	—Unverified
Goal-Driven Sequential Data Abstraction	Jul 29, 2019	BenchmarkingGeneral Reinforcement Learning	—Unverified
A Survey on Vision Autoregressive Model	Nov 13, 2024	3D GenerationBenchmarking	—Unverified
A Survey on Temporal Sentence Grounding in Videos	Sep 16, 2021	Action LocalizationBenchmarking	—Unverified
Benchmarking Reinforcement Learning Methods for Dexterous Robotic Manipulation with a Three-Fingered Gripper	Aug 27, 2024	BenchmarkingReinforcement Learning (RL)	—Unverified
4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions	Dec 31, 2022	Autonomous DrivingBenchmarking	—Unverified
Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)	Jun 15, 2020	BenchmarkingDomain Adaptation	—Unverified
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models	Apr 10, 2024	BenchmarkingDenoising	—Unverified
Graph Alignment for Benchmarking Graph Neural Networks and Learning Positional Encodings	May 19, 2025	BenchmarkingCombinatorial Optimization	—Unverified
Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices	Jun 2, 2025	Benchmarking	—Unverified
Helsinki Deblur Challenge 2021: description of photographic data	May 21, 2021	BenchmarkingDeblurring	—Unverified
A Survey on Semi-Supervised Learning for Delayed Partially Labelled Data Streams	Jun 16, 2021	Active LearningBenchmarking	—Unverified
A Survey on Preserving Fairness Guarantees in Changing Environments	Nov 14, 2022	BenchmarkingDecision Making	—Unverified
Benchmarking Reasoning Robustness in Large Language Models	Mar 6, 2025	BenchmarkingMath	—Unverified
Benchmarking real-time monitoring strategies for ethanol production from lignocellulosic biomass	Jan 29, 2021	Benchmarking	—Unverified
Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods	May 17, 2021	BenchmarkingDiversity	—Unverified
Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining	Jan 16, 2022	BenchmarkingLanguage Modelling	—Unverified

Show:10 25 50

← PrevPage 51 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified