Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4151–4175 of 5548 papers

Title	Date	Tasks	Status
PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs	Jun 24, 2024	BenchmarkingMachine Unlearning	—Unverified
Pitfalls of topology-aware image segmentation	Dec 19, 2024	BenchmarkingImage Segmentation	—Unverified
pix2pockets: Shot Suggestions in 8-Ball Pool from a Single Image in the Wild	Apr 16, 2025	Benchmarkingobject-detection	—Unverified
A Computer Vision System to Localize and Classify Wastes on the Streets	Oct 31, 2017	Benchmarking	—Unverified
Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities	May 16, 2025	Benchmarking	—Unverified
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects	Jun 16, 2025	BenchmarkingInstance Segmentation	—Unverified
PKLot-A robust dataset for parking lot classification	Jul 1, 2015	BenchmarkingClassification	—Unverified
PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI	May 19, 2025	BenchmarkingMinecraft	—Unverified
BEADs: Bias Evaluation Across Domains	Jun 6, 2024	BenchmarkingBias Detection	—Unverified
BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs	Apr 15, 2025	BenchmarkingSubgraph Counting	—Unverified
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment	Feb 17, 2025	BenchmarkingCommon Sense Reasoning	—Unverified
BBOB Instance Analysis: Landscape Properties and Algorithm Performance across Problem Instances	Nov 29, 2022	Benchmarking	—Unverified
Bayesian Neural Networks at Scale: A Performance Analysis and Pruning Study	May 23, 2020	BenchmarkingNetwork Pruning	—Unverified
Bayesian Multi-type Mean Field Multi-agent Imitation Learning	Dec 1, 2020	BenchmarkingImitation Learning	—Unverified
White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs	Apr 16, 2024	BenchmarkingLanguage Modelling	—Unverified
A Bayesian Model for Bivariate Causal Inference	Dec 24, 2018	BenchmarkingCausal Inference	—Unverified
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking	Jun 21, 2023	Adversarial RobustnessBenchmarking	—Unverified
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking	Feb 28, 2023	Adversarial RobustnessBenchmarking	—Unverified
Barkour: Benchmarking Animal-level Agility with Quadruped Robots	May 24, 2023	BenchmarkingNavigate	—Unverified
BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models for Violence Inciting Text Detection in Bengali	Oct 16, 2023	BenchmarkingData Augmentation	—Unverified
Point Cloud Compression and Objective Quality Assessment: A Survey	Jun 28, 2025	Autonomous DrivingBenchmarking	—Unverified
Point Cloud Objective Quality: Benchmarking Features and Quality Evaluation	Apr 4, 2025	AttributeBenchmarking	—Unverified
Polarization and Index Modulations: a Theoretical and Practical Perspective	Mar 20, 2018	BenchmarkingNavigate	—Unverified
Policy Entropy for Out-of-Distribution Classification	May 25, 2020	BenchmarkingClassification	—Unverified
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding	May 23, 2025	BenchmarkingSpatial Reasoning	—Unverified

Show:10 25 50

← PrevPage 167 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified