Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4051–4100 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks	Mar 15, 2024	Adversarial AttackAdversarial Robustness	—Unverified
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents	Jun 19, 2025	Benchmarking	—Unverified
oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving	May 13, 2024	AttributeAutonomous Driving	—Unverified
Benchmarking Adversarial Robustness of Compressed Deep Learning Models	Aug 16, 2023	Adversarial RobustnessBenchmarking	—Unverified
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms	May 22, 2025	Adversarial AttackBenchmarking	—Unverified
Out of Distribution Performance of State of Art Vision Model	Jan 25, 2023	Benchmarking	—Unverified
Benchmarking Adversarial Robustness	Dec 26, 2019	Adversarial AttackAdversarial Robustness	—Unverified
Overconfident Oracles: Limitations of In Silico Sequence Design Benchmarking	Feb 24, 2025	Benchmarking	—Unverified
Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling	May 2, 2025	Benchmarking	—Unverified
Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving	May 1, 2014	Benchmarking	—Unverified
Benchmarking Adversarially Robust Quantum Machine Learning at Scale	Nov 23, 2022	Adversarial AttackAdversarial Attack Detection	—Unverified
OVQA: A Clinically Generated Visual Question Answering Dataset	Jul 7, 2022	BenchmarkingMedical Visual Question Answering	—Unverified
Paddy Doctor: A Visual Image Dataset for Automated Paddy Disease Classification and Benchmarking	May 23, 2022	BenchmarkingClassification	—Unverified
Benchmarking adversarial attacks and defenses for time-series data	Aug 30, 2020	Adversarial DefenseBenchmarking	—Unverified
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms	Oct 5, 2024	BenchmarkingGPU	—Unverified
Benchmarking Advanced Text Anonymisation Methods: A Comparative Study on Novel and Traditional Approaches	Apr 22, 2024	BenchmarkingDiversity	—Unverified
Benchmarking Adaptive Intelligence and Computer Vision on Human-Robot Collaboration	Sep 30, 2024	BenchmarkingIntent Detection	—Unverified
Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances	Aug 3, 2023	Benchmarking	—Unverified
Paradigm Shift in Sustainability Disclosure Analysis: Empowering Stakeholders with CHATREPORT, a Language Model-Based Tool	Jun 27, 2023	BenchmarkingLanguage Modeling	—Unverified
Para-Lane: Multi-Lane Dataset Registering Parallel Scans for Benchmarking Novel View Synthesis	Feb 21, 2025	3DGSAutonomous Driving	—Unverified
Benchmarking Active Learning Strategies for Materials Optimization and Discovery	Apr 12, 2022	Active LearningBenchmarking	—Unverified
A critical analysis of metrics used for measuring progress in artificial intelligence	Aug 6, 2020	Benchmarking	—Unverified
True Online TD-Replan(lambda) Achieving Planning through Replaying	Jan 31, 2025	Benchmarking	—Unverified
Benchmarking Active Learning for NILM	Nov 24, 2024	Active LearningBenchmarking	—Unverified
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles	Jan 13, 2025	ArticlesBenchmarking	—Unverified
Parsing Any Domain English text to CoNLL dependencies	May 1, 2012	BenchmarkingDependency Parsing	—Unverified
Trust but Verify: Programmatic VLM Evaluation in the Wild	Oct 17, 2024	BenchmarkingLanguage Modelling	—Unverified
Participatory Personalization in Classification	Feb 8, 2023	BenchmarkingClassification	—Unverified
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems	Nov 23, 2016	BenchmarkingObject	—Unverified
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques	May 22, 2025	Benchmarking	—Unverified
Benchmarking a Benchmark: How Reliable is MS-COCO?	Nov 5, 2023	Benchmarkingimage-classification	—Unverified
PASTA: A Dataset for Modeling Participant States in Narratives	Jul 31, 2022	BenchmarkingCommon Sense Reasoning	—Unverified
Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval	May 28, 2025	BenchmarkingRecommendation Systems	—Unverified
PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database	Jun 23, 2021	BenchmarkingClustering	—Unverified
PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms	May 4, 2021	Benchmarking	—Unverified
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology	May 26, 2025	BenchmarkingPrognosis	—Unverified
Patherea: Cell Detection and Classification for the 2020s	Dec 21, 2024	BenchmarkingCell Detection	—Unverified
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis	May 27, 2024	Benchmarking	—Unverified
A Continuously Growing Dataset of Sentential Paraphrases	Aug 1, 2017	BenchmarkingParaphrase Identification	—Unverified
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications	Jul 12, 2023	Benchmarking	—Unverified
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite	May 20, 2023	Benchmarking	—Unverified
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints	May 23, 2025	Benchmarking	—Unverified
Object Pose Estimation in Robotics Revisited	Jun 6, 2019	3D Pose Estimation6D Pose Estimation	—Unverified
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction	Nov 8, 2024	3D ReconstructionBenchmarking	—Unverified
Benchmarking 3D Human Pose Estimation Models Under Occlusions	Apr 14, 2025	3D Human Pose EstimationBenchmarking	—Unverified
IN-Sight: Interactive Navigation through Sight	Aug 1, 2024	BenchmarkingNavigate	—Unverified
Benchmarking 2D Egocentric Hand Pose Datasets	Sep 11, 2024	Activity RecognitionBenchmarking	—Unverified
Benchmark for Antibody Binding Affinity Maturation and Design	May 23, 2025	Benchmarking	—Unverified
Perception Test 2023: A Summary of the First Challenge And Outcome	Dec 20, 2023	BenchmarkingGrounded Video Question Answering	—Unverified
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Nov 29, 2024	BenchmarkingGrounded Video Question Answering	—Unverified

Show:10 25 50

← PrevPage 82 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified