Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2401–2425 of 5548 papers

Title	Date	Tasks	Status
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified
Benchmarking the Robustness of Quantized Models	Apr 8, 2023	BenchmarkingQuantization	—Unverified
Benchmarking the Robustness of Panoptic Segmentation for Automated Driving	Feb 23, 2024	BenchmarkingDecision Making	—Unverified
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models	Apr 1, 2025	BenchmarkingConversational Question Answering	—Unverified
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery	Apr 5, 2022	Benchmarkingobject-detection	—Unverified
A Baseline Method for Removing Invisible Image Watermarks using Deep Image Prior	Feb 19, 2025	BenchmarkingMisinformation	—Unverified
Benchmarking the Robustness of Instance Segmentation Models	Sep 2, 2021	BenchmarkingDomain Adaptation	—Unverified
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem	Jul 13, 2024	BenchmarkingDeep Learning	—Unverified
Genetic algorithm for feature selection of EEG heterogeneous data	Mar 12, 2021	BenchmarkingEEG	—Unverified
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training	Apr 30, 2025	Benchmarking	—Unverified
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT	Nov 1, 2020	Automatic Post-EditingBenchmarking	—Unverified
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance	Mar 23, 2023	BenchmarkingData Augmentation	—Unverified
Benchmarking the rationality of AI decision making using the transitivity axiom	Feb 14, 2025	BenchmarkingDecision Making	—Unverified
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)	Nov 23, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection	Apr 11, 2023	Adversarial AttackAdversarial Robustness	—Unverified
AutoLay: Benchmarking amodal layout estimation for autonomous driving	Aug 20, 2021	Amodal Layout EstimationAutonomous Driving	—Unverified
Benchmarking the Neural Linear Model for Regression	Dec 18, 2019	Bayesian OptimizationBenchmarking	—Unverified
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model	Jan 20, 2025	Benchmarking	—Unverified
Benchmarking the Impact of Noise on Deep Learning-based Classification of Atrial Fibrillation in 12-Lead ECG	Mar 24, 2023	Atrial Fibrillation DetectionBenchmarking	—Unverified
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified
Benchmarking the human brain against computational architectures	May 15, 2023	BenchmarkingComputational Efficiency	—Unverified
A Conformance Checking-based Approach for Drift Detection in Business Processes	Jul 9, 2019	BenchmarkingDrift Detection	—Unverified
FunBench: Benchmarking Fundus Reading Skills of MLLMs	Mar 2, 2025	AnatomyBenchmarking	—Unverified
Efficient Pauli channel estimation with logarithmic quantum memory	Sep 25, 2023	Benchmarking	—Unverified
AutoAI-TS: AutoAI for Time Series Forecasting	Feb 24, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified

Show:10 25 50

← PrevPage 97 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified