Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5251–5300 of 5548 papers

Title	Date	Tasks	Status
PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding	Dec 6, 2018	3D Instance Segmentation3D Semantic Segmentation	CodeCode Available
CVC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models	Jun 2, 2025	Benchmarking	CodeCode Available
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022	Jan 31, 2023	Action DetectionBenchmarking	CodeCode Available
PATCH! Psychometrics-AssisTed BenCHmarking of Large Language Models against Human Populations: A Case Study of Proficiency in 8th Grade Mathematics	Apr 2, 2024	Benchmarking	CodeCode Available
Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models	Jul 23, 2024	BenchmarkingSegmentation	CodeCode Available
A Position Paper on the Automatic Generation of Machine Learning Leaderboards	May 23, 2025	BenchmarkingPosition	CodeCode Available
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Jan 14, 2025	BenchmarkingGraph Representation Learning	CodeCode Available
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees	Apr 24, 2024	BenchmarkingMolecular Property Prediction	CodeCode Available
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset	May 30, 2025	BenchmarkingMultiple Instance Learning	CodeCode Available
Attribution of Predictive Uncertainties in Classification Models	Jul 19, 2021	BenchmarkingClassification	CodeCode Available
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs	Sep 26, 2024	BenchmarkingConformal Prediction	CodeCode Available
Agentic-HLS: An agentic reasoning based high-level synthesis system using large language models (AI for EDA workshop 2024)	Dec 2, 2024	BenchmarkingHigh-Level Synthesis	CodeCode Available
Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level	Apr 8, 2024	Benchmarking	CodeCode Available
Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA	Jul 22, 2024	BenchmarkingContrastive Learning	CodeCode Available
Custom Dual Transportation Mode Detection by Smartphone Devices Exploiting Sensor Diversity	Oct 12, 2018	Activity RecognitionBenchmarking	CodeCode Available
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems	Jun 9, 2025	AttributeBenchmarking	CodeCode Available
PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models	Dec 9, 2024	BenchmarkingInstruction Following	CodeCode Available
CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants	Oct 28, 2024	Benchmarking	CodeCode Available
CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection	Jun 21, 2021	BenchmarkingDomain Adaptation	CodeCode Available
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels	Nov 21, 2024	BenchmarkingMachine Translation	CodeCode Available
Ants can orienteer a thief in their robbery	Apr 15, 2020	BenchmarkingCombinatorial Optimization	CodeCode Available
3DOS: Towards 3D Open Set Learning -- Benchmarking and Understanding Semantic Novelty Detection on Point Clouds	Jul 23, 2022	BenchmarkingNovelty Detection	CodeCode Available
Benchmarking Generative Latent Variable Models for Speech	Feb 22, 2022	BenchmarkingImage Generation	CodeCode Available
Benchmarking Generative AI Models for Deep Learning Test Input Generation	Dec 23, 2024	BenchmarkingDeep Learning	CodeCode Available
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding	Sep 26, 2022	BenchmarkingNatural Language Queries	CodeCode Available
C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation	Jun 16, 2025	BenchmarkingRecommendation Systems	CodeCode Available
Performance Evaluation of Real-Time Object Detection for Electric Scooters	May 5, 2024	Autonomous VehiclesBenchmarking	CodeCode Available
Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis	Feb 14, 2018	BenchmarkingCausal Inference	CodeCode Available
A General Benchmarking Framework for Text Generation	Dec 1, 2020	BenchmarkingKnowledge Graphs	CodeCode Available
Performance Modeling of Data Storage Systems using Generative Models	Jul 5, 2023	Benchmarking	CodeCode Available
Zero-Shot Hyperspectral Pansharpening Using Hysteresis-Based Tuning for Spectral Quality Control	May 22, 2025	BenchmarkingPansharpening	CodeCode Available
Vector-Based Data Improves Left-Right Eye-Tracking Classifier Performance After a Covariate Distributional Shift	Jul 31, 2022	BenchmarkingEEG	CodeCode Available
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge	Dec 18, 2024	BenchmarkingWorld Knowledge	CodeCode Available
Periodic Extrapolative Generalisation in Neural Networks	Sep 21, 2022	Benchmarking	CodeCode Available
Standardizing Structural Causal Models	Jun 17, 2024	BenchmarkingCausal Inference	CodeCode Available
Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform	Jan 5, 2022	Benchmarking	CodeCode Available
StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis	May 28, 2025	Benchmarking	CodeCode Available
Benchmarking framework for machine learning classification from fNIRS data	Mar 3, 2023	BenchmarkingBrain Computer Interface	CodeCode Available
PersoBench: Benchmarking Personalized Response Generation in Large Language Models	Oct 4, 2024	BenchmarkingDialogue Generation	CodeCode Available
STA: Self-controlled Text Augmentation for Improving Text Classifications	Feb 24, 2023	BenchmarkingText Augmentation	CodeCode Available
Architecture Analysis and Benchmarking of 3D U-shaped Deep Learning Models for Thoracic Anatomical Segmentation	Feb 5, 2024	BenchmarkingImage Segmentation	CodeCode Available
XCompress: LLM assisted Python-based text compression toolkit	Aug 12, 2024	BenchmarkingLanguage Modeling	CodeCode Available
A Framework for Generating Informative Benchmark Instances	May 29, 2022	Benchmarking	CodeCode Available
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?	Oct 26, 2022	BenchmarkingQuestion Answering	CodeCode Available
Towards Robust Metrics for Concept Representation Evaluation	Jan 25, 2023	BenchmarkingDisentanglement	CodeCode Available
Statistical Multicriteria Evaluation of LLM-Generated Text	Jun 22, 2025	BenchmarkingDiversity	CodeCode Available
ANTHROPOS-V: benchmarking the novel task of Crowd Volume Estimation	Jan 3, 2025	BenchmarkingCrowd Counting	CodeCode Available
Answer Consolidation: Formulation and Benchmarking	Apr 29, 2022	BenchmarkingQuestion Answering	CodeCode Available
A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches	May 22, 2023	BenchmarkingClassification	CodeCode Available
A novel evaluation methodology for supervised Feature Ranking algorithms	Jul 9, 2022	BenchmarkingFeature Importance	CodeCode Available

Show:10 25 50

← PrevPage 106 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified