Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1601–1625 of 5548 papers

Title	Date	Tasks	Status	Score
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum Chemistry	Oct 15, 2024	Benchmarking	CodeCode Available	5
An Integrated Framework for Multi-Granular Explanation of Video Summarization	May 16, 2024	BenchmarkingPanoptic Segmentation	CodeCode Available	5
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available	5
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers	Mar 31, 2023	Benchmarkingimage-classification	CodeCode Available	5
Selecting the motion ground truth for loose-fitting wearables: benchmarking optical MoCap methods	Jul 21, 2023	Benchmarking	CodeCode Available	5
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regions	Nov 19, 2023	Bayesian OptimizationBenchmarking	CodeCode Available	5
An implementation of the "Guess who?" game using CLIP	Nov 30, 2021	Benchmarking	CodeCode Available	5
Adjusting Pretrained Backbones for Performativity	Oct 6, 2024	BenchmarkingDeep Learning	CodeCode Available	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	5
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis	Mar 18, 2025	BenchmarkingDrug Response Prediction	CodeCode Available	5
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes Equations	Jun 29, 2022	Benchmarking	CodeCode Available	5
Knowledge Enhanced Conditional Imputation for Healthcare Time-series	Dec 27, 2023	BenchmarkingImputation	CodeCode Available	5
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available	5
An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimality	Jul 20, 2021	BenchmarkingDiagnostic	CodeCode Available	5
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	5
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios	Mar 8, 2025	BenchmarkingDiagnostic	CodeCode Available	5
MANTRA: The Manifold Triangulations Assemblage	Oct 3, 2024	Benchmarking	CodeCode Available	5
KArSL: Arabic Sign Language Database	Jan 1, 2021	BenchmarkingSign Language Recognition	CodeCode Available	5
An Experimental Study of the Transferability of Spectral Graph Networks	Dec 18, 2020	BenchmarkingGeneral Classification	CodeCode Available	5
Benchmarking Classic and Learned Navigation in Complex 3D Environments	Jan 30, 2019	Benchmarking	CodeCode Available	5
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen	Mar 3, 2022	Benchmarking	CodeCode Available	5
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	5
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data	Dec 6, 2024	BenchmarkingImputation	CodeCode Available	5
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models	Jun 15, 2024	BenchmarkingData Augmentation	CodeCode Available	5
JExplore: Design Space Exploration Tool for Nvidia Jetson Boards	Feb 16, 2025	BenchmarkingGPU	CodeCode Available	5

Show:10 25 50

← PrevPage 65 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified