Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1601–1650 of 5548 papers

Title	Date	Tasks	Status	Score
An Integrated Framework for Multi-Granular Explanation of Video Summarization	May 16, 2024	BenchmarkingPanoptic Segmentation	CodeCode Available	5
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural Networks	Feb 2, 2025	Benchmarking	CodeCode Available	5
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available	5
Learned Sorted Table Search and Static Indexes in Small Model Space	Jul 19, 2021	BenchmarkingOpen-Ended Question Answering	CodeCode Available	5
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking	Jul 30, 2018	Benchmarkingfeature selection	CodeCode Available	5
Large-scale Ridesharing DARP Instances Based on Real Travel Demand	May 30, 2023	Benchmarking	CodeCode Available	5
An implementation of the "Guess who?" game using CLIP	Nov 30, 2021	Benchmarking	CodeCode Available	5
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?	May 19, 2021	BenchmarkingSentence	CodeCode Available	5
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available	5
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available	5
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis	Mar 18, 2025	BenchmarkingDrug Response Prediction	CodeCode Available	5
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes Equations	Jun 29, 2022	Benchmarking	CodeCode Available	5
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available	5
An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimality	Jul 20, 2021	BenchmarkingDiagnostic	CodeCode Available	5
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges	Mar 11, 2025	Benchmarking	CodeCode Available	5
Learnability and Complexity of Quantum Samples	Oct 22, 2020	Benchmarking	CodeCode Available	5
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book	Jun 1, 2025	Benchmarking	CodeCode Available	5
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methods	Jul 21, 2023	Benchmarking	CodeCode Available	5
An Experimental Study of the Transferability of Spectral Graph Networks	Dec 18, 2020	BenchmarkingGeneral Classification	CodeCode Available	5
Benchmarking Classic and Learned Navigation in Complex 3D Environments	Jan 30, 2019	Benchmarking	CodeCode Available	5
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data	Dec 6, 2024	BenchmarkingImputation	CodeCode Available	5
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers	Mar 31, 2023	Benchmarkingimage-classification	CodeCode Available	5
Language-based Image Colorization: A Benchmark and Beyond	Mar 19, 2025	BenchmarkingColorization	CodeCode Available	5
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models	Jun 15, 2024	BenchmarkingData Augmentation	CodeCode Available	5
Benchmarking ChatGPT on Algorithmic Reasoning	Apr 4, 2024	Benchmarking	CodeCode Available	5
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology	Apr 24, 2023	BenchmarkingDecision Making	CodeCode Available	5
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios	Mar 8, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	5
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk	May 21, 2019	Bayesian InferenceBenchmarking	CodeCode Available	5
Knowledge Enhanced Conditional Imputation for Healthcare Time-series	Dec 27, 2023	BenchmarkingImputation	CodeCode Available	5
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions	Nov 19, 2023	Bayesian OptimizationBenchmarking	CodeCode Available	5
KhabarChin: Automatic Detection of Important News in the Persian Language	Dec 6, 2023	ArticlesBenchmarking	CodeCode Available	5
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus Detection	Nov 23, 2018	BenchmarkingCervical Nucleus Detection	CodeCode Available	5
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen	Mar 3, 2022	Benchmarking	CodeCode Available	5
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papers	Nov 6, 2021	BenchmarkingRetinal Vessel Segmentation	CodeCode Available	5
KArSL: Arabic Sign Language Database	Jan 1, 2021	BenchmarkingSign Language Recognition	CodeCode Available	5
A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data	Nov 11, 2019	BenchmarkingDecision Making	CodeCode Available	5
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	5
Joint Multi-Scale Tone Mapping and Denoising for HDR Image Enhancement	Mar 16, 2023	BenchmarkingDemosaicking	CodeCode Available	5
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available	5
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge	May 8, 2025	Benchmarking	CodeCode Available	5
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	5
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency Prediction	Jul 3, 2025	Benchmarking	CodeCode Available	5
Benchmarking AutoML algorithms on a collection of synthetic classification problems	Dec 6, 2022	AutoMLBenchmarking	CodeCode Available	5
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment	Feb 13, 2023	BenchmarkingSegmentation	CodeCode Available	5
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs	May 29, 2025	BenchmarkingFairness	CodeCode Available	5
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability	Feb 3, 2020	BenchmarkingDiscrete Choice Models	CodeCode Available	5
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large p	Oct 17, 2024	Benchmarkingregression	CodeCode Available	5
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs	Apr 10, 2024	Benchmarkingknowledge editing	CodeCode Available	5
Benchmarking a transformer-FREE model for ad-hoc retrieval	Apr 1, 2021	BenchmarkingCPU	CodeCode Available	5

Show:10 25 50

← PrevPage 33 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified