Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2401–2450 of 5548 papers

Title	Date	Tasks	Status	Score
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition	Jul 16, 2025	BenchmarkingKnowledge Distillation	CodeCode Available	5
Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation	Sep 24, 2024	BenchmarkingMovie Recommendation	CodeCode Available	5
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics	Mar 7, 2019	BenchmarkingClustering	CodeCode Available	5
Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis	Mar 18, 2022	BenchmarkingObject Recognition	CodeCode Available	5
Are Synthetic Corruptions A Reliable Proxy For Real-World Corruptions?	May 7, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	5
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective	Dec 10, 2024	Benchmarking	CodeCode Available	5
Graph-theoretical approach to robust 3D normal extraction of LiDAR data	May 23, 2022	Benchmarking	CodeCode Available	5
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations	Jan 24, 2022	BenchmarkingDrug Discovery	CodeCode Available	5
Benchmarking Learning Efficiency in Deep Reservoir Computing	Sep 29, 2022	Benchmarking	CodeCode Available	5
Learning Conjoint Attentions for Graph Neural Nets	Feb 5, 2021	BenchmarkingGraph Attention	CodeCode Available	5
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking	May 24, 2023	BenchmarkingGraph Mining	CodeCode Available	5
DQI: Measuring Data Quality in NLP	May 2, 2020	Active LearningBenchmarking	CodeCode Available	5
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation	Apr 21, 2025	Benchmarking	CodeCode Available	5
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction	Jul 3, 2025	Benchmarking	CodeCode Available	5
A General Benchmarking Framework for Text Generation	Dec 1, 2020	BenchmarkingKnowledge Graphs	CodeCode Available	5
GOAL: Towards Benchmarking Few-Shot Sports Game Summarization	Jul 18, 2022	Benchmarking	CodeCode Available	5
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available	5
Separating form and meaning: Using self-consistency to quantify task understanding across multiple senses	May 19, 2023	BenchmarkingForm	CodeCode Available	5
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available	5
A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric	Jan 22, 2021	BenchmarkingSentence	CodeCode Available	5
Benchmarking Large Language Model Uncertainty for Prompt Optimization	Sep 16, 2024	BenchmarkingDiversity	CodeCode Available	5
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural Networks	Jan 7, 2024	BenchmarkingGraph Neural Network	CodeCode Available	5
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems	Oct 8, 2023	Benchmarking	CodeCode Available	5
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring	Feb 10, 2025	Benchmarking	CodeCode Available	5
Evaluating the Transferability of Machine-Learned Force Fields for Material Property Modeling	Jan 10, 2023	BenchmarkingGraph Neural Network	CodeCode Available	5
Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction	May 23, 2023	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available	5
Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic Environments	Feb 20, 2023	BenchmarkingRobot Navigation	CodeCode Available	5
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks	Feb 2, 2025	Benchmarking	CodeCode Available	5
Learn How to Query from Unlabeled Data Streams in Federated Learning	Dec 11, 2024	BenchmarkingDecision Making	CodeCode Available	5
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking	Jul 30, 2018	Benchmarkingfeature selection	CodeCode Available	5
Geological Inference from Textual Data using Word Embeddings	Apr 10, 2025	BenchmarkingWord Embeddings	CodeCode Available	5
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree search	Jan 26, 2025	BenchmarkingDiversity	CodeCode Available	5
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation	Jul 17, 2020	BenchmarkingDisentanglement	CodeCode Available	5
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks	Nov 15, 2023	BenchmarkingNetwork Pruning	CodeCode Available	5
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M	May 15, 2025	BenchmarkingMemorization	CodeCode Available	5
Do LLM Evaluators Prefer Themselves for a Reason?	Apr 4, 2025	BenchmarkingCode Generation	CodeCode Available	5
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Jan 22, 2025	Benchmarking	CodeCode Available	5
Flexible Generation of Preference Data for Recommendation Analysis	Jul 23, 2024	BenchmarkingRecommendation Systems	CodeCode Available	5
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset	Feb 8, 2024	Benchmarking	CodeCode Available	5
Graph Convolutional Networks Meet with High Dimensionality Reduction	Nov 7, 2019	BenchmarkingDimensionality Reduction	CodeCode Available	5
Hierarchical Neural Networks for Sequential Sentence Classification in Medical Scientific Abstracts	Aug 19, 2018	BenchmarkingClassification	CodeCode Available	5
Strong and Simple Baselines for Multimodal Utterance Embeddings	May 14, 2019	Benchmarking	CodeCode Available	5
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider	Apr 26, 2025	BenchmarkingGPU	CodeCode Available	5
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams	Jun 17, 2024	AllBenchmarking	CodeCode Available	5
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available	5
Benchmarking Large Language Models for Math Reasoning Tasks	Aug 20, 2024	BenchmarkingIn-Context Learning	CodeCode Available	5
Benchmarking Large Language Models for Image Classification of Marine Mammals	Oct 22, 2024	Benchmarkingimage-classification	CodeCode Available	5
Divergent Creativity in Humans and Large Language Models	May 13, 2024	Benchmarking	CodeCode Available	5
Generalization and Regularization in DQN	Sep 29, 2018	Atari GamesBenchmarking	CodeCode Available	5
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal Data	Feb 22, 2024	Benchmarking	CodeCode Available	5

Show:10 25 50

← PrevPage 49 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified