Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5251–5275 of 5548 papers

Title	Date	Tasks	Status
PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding	Dec 6, 2018	3D Instance Segmentation3D Semantic Segmentation	CodeCode Available
CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models	Jun 2, 2025	Benchmarking	CodeCode Available
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022	Jan 31, 2023	Action DetectionBenchmarking	CodeCode Available
PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics	Apr 2, 2024	Benchmarking	CodeCode Available
Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models	Jul 23, 2024	BenchmarkingSegmentation	CodeCode Available
A Position Paper on the Automatic Generation of Machine Learning Leaderboards	May 23, 2025	BenchmarkingPosition	CodeCode Available
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Jan 14, 2025	BenchmarkingGraph Representation Learning	CodeCode Available
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees	Apr 24, 2024	BenchmarkingMolecular Property Prediction	CodeCode Available
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset	May 30, 2025	BenchmarkingMultiple Instance Learning	CodeCode Available
Attribution of Predictive Uncertainties in Classification Models	Jul 19, 2021	BenchmarkingClassification	CodeCode Available
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs	Sep 26, 2024	BenchmarkingConformal Prediction	CodeCode Available
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)	Dec 2, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available
Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level	Apr 8, 2024	Benchmarking	CodeCode Available
Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA	Jul 22, 2024	BenchmarkingContrastive Learning	CodeCode Available
Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity	Oct 12, 2018	Activity RecognitionBenchmarking	CodeCode Available
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems	Jun 9, 2025	AttributeBenchmarking	CodeCode Available
PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Dec 9, 2024	BenchmarkingInstruction Following	CodeCode Available
CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants	Oct 28, 2024	Benchmarking	CodeCode Available
CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection	Jun 21, 2021	BenchmarkingDomain Adaptation	CodeCode Available
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels	Nov 21, 2024	BenchmarkingMachine Translation	CodeCode Available
Ants can orienteer a thief in their robbery	Apr 15, 2020	BenchmarkingCombinatorial Optimization	CodeCode Available
3DOS: Towards 3D Open Set Learning -- Benchmarking and Understanding Semantic Novelty Detection on Point Clouds	Jul 23, 2022	BenchmarkingNovelty Detection	CodeCode Available
Benchmarking Generative Latent Variable Models for Speech	Feb 22, 2022	BenchmarkingImage Generation	CodeCode Available
Benchmarking Generative AI Models for Deep Learning Test Input Generation	Dec 23, 2024	BenchmarkingDeep Learning	CodeCode Available
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding	Sep 26, 2022	BenchmarkingNatural Language Queries	CodeCode Available

Show:10 25 50

← PrevPage 211 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified