Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4076–4100 of 5548 papers

Title	Date	Tasks	Status
Parsing Any Domain English text to CoNLL dependencies	May 1, 2012	BenchmarkingDependency Parsing	—Unverified
Trust but Verify: Programmatic VLM Evaluation in the Wild	Oct 17, 2024	BenchmarkingLanguage Modelling	—Unverified
Participatory Personalization in Classification	Feb 8, 2023	BenchmarkingClassification	—Unverified
'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems	Nov 23, 2016	BenchmarkingObject	—Unverified
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques	May 22, 2025	Benchmarking	—Unverified
Benchmarking a Benchmark: How Reliable is MS-COCO?	Nov 5, 2023	Benchmarkingimage-classification	—Unverified
PASTA: A Dataset for Modeling Participant States in Narratives	Jul 31, 2022	BenchmarkingCommon Sense Reasoning	—Unverified
Yambda-5B -- A Large-Scale Multi-modal Dataset for Ranking And Retrieval	May 28, 2025	BenchmarkingRecommendation Systems	—Unverified
PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database	Jun 23, 2021	BenchmarkingClustering	—Unverified
PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms	May 4, 2021	Benchmarking	—Unverified
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology	May 26, 2025	BenchmarkingPrognosis	—Unverified
Patherea: Cell Detection and Classification for the 2020s	Dec 21, 2024	BenchmarkingCell Detection	—Unverified
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis	May 27, 2024	Benchmarking	—Unverified
A Continuously Growing Dataset of Sentential Paraphrases	Aug 1, 2017	BenchmarkingParaphrase Identification	—Unverified
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications	Jul 12, 2023	Benchmarking	—Unverified
Patterns of Convergence and Bound Constraint Violation in Differential Evolution on SBOX-COST Benchmarking Suite	May 20, 2023	Benchmarking	—Unverified
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints	May 23, 2025	Benchmarking	—Unverified
Object Pose Estimation in Robotics Revisited	Jun 6, 2019	3D Pose Estimation6D Pose Estimation	—Unverified
Benchmarking 3D multi-coil NC-PDNet MRI reconstruction	Nov 8, 2024	3D ReconstructionBenchmarking	—Unverified
Benchmarking 3D Human Pose Estimation Models Under Occlusions	Apr 14, 2025	3D Human Pose EstimationBenchmarking	—Unverified
IN-Sight: Interactive Navigation through Sight	Aug 1, 2024	BenchmarkingNavigate	—Unverified
Benchmarking 2D Egocentric Hand Pose Datasets	Sep 11, 2024	Activity RecognitionBenchmarking	—Unverified
Benchmark for Antibody Binding Affinity Maturation and Design	May 23, 2025	Benchmarking	—Unverified
Perception Test 2023: A Summary of the First Challenge And Outcome	Dec 20, 2023	BenchmarkingGrounded Video Question Answering	—Unverified
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Nov 29, 2024	BenchmarkingGrounded Video Question Answering	—Unverified

Show:10 25 50

← PrevPage 164 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified