Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5501–5525 of 5548 papers

Title	Date	Tasks	Status
Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networks	Oct 30, 2022	Anomaly DetectionBenchmarking	CodeCode Available
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?	Mar 27, 2025	BenchmarkingSpecificity	CodeCode Available
TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability	Jun 4, 2024	BenchmarkingLanguage Modeling	CodeCode Available
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library	Oct 3, 2016	Adversarial AttackAdversarial Defense	CodeCode Available
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge	May 8, 2025	Benchmarking	CodeCode Available
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment	Feb 13, 2023	BenchmarkingSegmentation	CodeCode Available
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing	Nov 1, 2024	BenchmarkingSemantic Segmentation	CodeCode Available
TSPP: A Unified Benchmarking Tool for Time-series Forecasting	Dec 28, 2023	BenchmarkingFeature Engineering	CodeCode Available
City-Scale Road Audit System using Deep Learning	Nov 26, 2018	BenchmarkingDeep Learning	CodeCode Available
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift	Apr 19, 2022	BenchmarkingClassification	CodeCode Available
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available
CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing	Jun 30, 2021	BenchmarkingTransfer Learning	CodeCode Available
Chumor 2.0: Towards Benchmarking Chinese Humor Understanding	Dec 23, 2024	Benchmarking	CodeCode Available
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs	May 26, 2025	BenchmarkingFault localization	CodeCode Available
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning	May 19, 2025	Benchmarking	CodeCode Available
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available
Random Machines: A bagged-weighted support vector model with free kernel choice	Nov 21, 2019	Benchmarkingregression	CodeCode Available
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions	Oct 5, 2024	BenchmarkingHallucination	CodeCode Available
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain	Nov 23, 2024	BenchmarkingDiversity	CodeCode Available
Ranking and benchmarking framework for sampling algorithms on synthetic data streams	Jun 17, 2020	BenchmarkingHyperparameter Optimization	CodeCode Available
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules	Jun 20, 2024	Benchmarking	CodeCode Available
Tunability: Importance of Hyperparameters of Machine Learning Algorithms	Feb 26, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Temporal receptive field in dynamic graph learning: A comprehensive analysis	Jul 17, 2024	BenchmarkingDynamic Link Prediction	CodeCode Available
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability	Feb 3, 2020	BenchmarkingDiscrete Choice Models	CodeCode Available
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model	Jul 31, 2024	BenchmarkingLarge Language Model	CodeCode Available

Show:10 25 50

← PrevPage 221 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified