Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1576–1600 of 5548 papers

Title	Date	Tasks	Status	Score
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers	Mar 31, 2023	Benchmarkingimage-classification	CodeCode Available	5
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available	5
Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation	Mar 2, 2021	BenchmarkingDeep Learning	CodeCode Available	5
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators	Sep 21, 2024	Benchmarking	CodeCode Available	5
Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset	Nov 13, 2024	Anomaly DetectionBenchmarking	CodeCode Available	5
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios	Mar 8, 2025	BenchmarkingDiagnostic	CodeCode Available	5
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation	Jun 26, 2025	BenchmarkingTransfer Learning	CodeCode Available	5
Deep Jansen-Rit Parameter Inference for Model-Driven Analysis of Brain Activity	Jun 7, 2024	BenchmarkingEEG	CodeCode Available	5
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available	5
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships	Jul 17, 2024	Benchmarking	CodeCode Available	5
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering	May 21, 2025	BenchmarkingLanguage Modeling	CodeCode Available	5
ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms	Jul 15, 2018	Benchmarking	CodeCode Available	5
KhabarChin: Automatic Detection of Important News in the Persian Language	Dec 6, 2023	ArticlesBenchmarking	CodeCode Available	5
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User Manuals	Jun 7, 2023	BenchmarkingMachine Reading Comprehension	CodeCode Available	5
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives	Nov 13, 2018	BenchmarkingIntrusion Detection	CodeCode Available	5
ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions	Jan 5, 2023	ArticlesBenchmarking	CodeCode Available	5
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning	Oct 9, 2024	BenchmarkingFairness	CodeCode Available	5
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms	Oct 16, 2019	Bayesian InferenceBenchmarking	CodeCode Available	5
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen	Mar 3, 2022	Benchmarking	CodeCode Available	5
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum Chemistry	Oct 15, 2024	Benchmarking	CodeCode Available	5
An Integrated Framework for Multi-Granular Explanation of Video Summarization	May 16, 2024	BenchmarkingPanoptic Segmentation	CodeCode Available	5
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available	5
KArSL: Arabic Sign Language Database	Jan 1, 2021	BenchmarkingSign Language Recognition	CodeCode Available	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	5
Joint Multi-Scale Tone Mapping and Denoising for HDR Image Enhancement	Mar 16, 2023	BenchmarkingDemosaicking	CodeCode Available	5

Show:10 25 50

← PrevPage 64 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified