Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2401–2450 of 5548 papers

Title	Date	Tasks	Status
Benchmarking the Robustness of Quantized Models	Apr 8, 2023	BenchmarkingQuantization	—Unverified
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving	Feb 23, 2024	BenchmarkingDecision Making	—Unverified
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models	Apr 1, 2025	BenchmarkingConversational Question Answering	—Unverified
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery	Apr 5, 2022	Benchmarkingobject-detection	—Unverified
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior	Feb 19, 2025	BenchmarkingMisinformation	—Unverified
Benchmarking the Robustness of Instance Segmentation Models	Sep 2, 2021	BenchmarkingDomain Adaptation	—Unverified
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem	Jul 13, 2024	BenchmarkingDeep Learning	—Unverified
Generalized Conflict-directed Search for Optimal Ordering Problems	Mar 31, 2021	BenchmarkingScheduling	—Unverified
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey	Jun 23, 2025	BenchmarkingSurvey	—Unverified
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT	Nov 1, 2020	Automatic Post-EditingBenchmarking	—Unverified
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance	Mar 23, 2023	BenchmarkingData Augmentation	—Unverified
Benchmarking the rationality of AI decision making using the transitivity axiom	Feb 14, 2025	BenchmarkingDecision Making	—Unverified
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)	Nov 23, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization	May 23, 2022	BenchmarkingDeep Reinforcement Learning	—Unverified
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection	Apr 11, 2023	Adversarial AttackAdversarial Robustness	—Unverified
AutoLay: Benchmarking amodal layout estimation for autonomous driving	Aug 20, 2021	Amodal Layout EstimationAutonomous Driving	—Unverified
Benchmarking the Neural Linear Model for Regression	Dec 18, 2019	Bayesian OptimizationBenchmarking	—Unverified
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model	Jan 20, 2025	Benchmarking	—Unverified
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow	Feb 14, 2025	Benchmarking	—Unverified
General Scales Unlock AI Evaluation with Explanatory and Predictive Power	Mar 9, 2025	BenchmarkingSpecificity	—Unverified
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges	Jun 27, 2024	BenchmarkingClinical Knowledge	—Unverified
Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG	Mar 24, 2023	Atrial Fibrillation DetectionBenchmarking	—Unverified
Benchmarking the human brain against computational architectures	May 15, 2023	BenchmarkingComputational Efficiency	—Unverified
A Conformance Checking-based Approach for Drift Detection in Business Processes	Jul 9, 2019	BenchmarkingDrift Detection	—Unverified
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases	May 25, 2024	BenchmarkingHallucination	—Unverified
AutoAI-TS: AutoAI for Time Series Forecasting	Feb 24, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
Benchmarking the Gerchberg-Saxton Algorithm	May 18, 2020	Benchmarking	—Unverified
ALdataset: a benchmark for pool-based active learning	Oct 16, 2020	Active LearningBenchmarking	—Unverified
Benchmarking the Fidelity and Utility of Synthetic Relational Data	Oct 4, 2024	BenchmarkingFeature Importance	—Unverified
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing	Jun 30, 2024	Benchmarkingcounterfactual	—Unverified
Generalised Gaussian Process Latent Variable Models (GPLVM) with Stochastic Variational Inference	Feb 25, 2022	BenchmarkingDimensionality Reduction	—Unverified
AA3DNet: Attention Augmented Real Time 3D Object Detection	Jul 26, 2021	3D Object DetectionAutonomous Vehicles	—Unverified
Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web	May 1, 2014	BenchmarkingEntity Linking	—Unverified
Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans	Jul 15, 2023	BenchmarkingDimensionality Reduction	—Unverified
A Computer Vision System to Localize and Classify Wastes on the Streets	Oct 31, 2017	Benchmarking	—Unverified
Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy	Apr 12, 2024	BenchmarkingCell Segmentation	—Unverified
Benchmarking the Capabilities of Large Language Models in Transportation System Engineering: Accuracy, Consistency, and Reasoning Behaviors	Aug 15, 2024	BenchmarkingManagement	—Unverified
Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets	Apr 19, 2021	BenchmarkingIntrusion Detection	—Unverified
A Universal Protocol to Benchmark Camera Calibration for Sports	Apr 15, 2024	BenchmarkingCamera Calibration	—Unverified
A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration	Jun 1, 2013	Benchmarking	—Unverified
A Unified Taylor Framework for Revisiting Attribution Methods	Aug 21, 2020	BenchmarkingDecision Making	—Unverified
Benchmarking the Accuracy and Robustness of Feedback Alignment Algorithms	Aug 30, 2021	Benchmarking	—Unverified
A Latent Fingerprint in the Wild Database	Apr 3, 2023	Benchmarking	—Unverified
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices	Mar 21, 2022	BenchmarkingGPU	—Unverified
Benchmarking terminology building capabilities of ChatGPT on an English-Russian Fashion Corpus	Dec 4, 2024	Benchmarking	—Unverified
A Unified Study of Machine Learning Explanation Evaluation Metrics	Mar 27, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Benchmarking Table Comprehension In The Wild	Dec 13, 2024	BenchmarkingQuestion Answering	—Unverified
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking	May 26, 2025	BenchmarkingOptical Flow Estimation	—Unverified
A Large-scale Study on Training Sample Memorization in Generative Modeling	Jan 1, 2021	BenchmarkingMemorization	—Unverified
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models	Mar 30, 2025	BenchmarkingRelational Reasoning	—Unverified

Show:10 25 50

← PrevPage 49 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified