Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5501–5548 of 5548 papers

Title	Date	Tasks	Status
Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networks	Oct 30, 2022	Anomaly DetectionBenchmarking	CodeCode Available
CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?	Mar 27, 2025	BenchmarkingSpecificity	CodeCode Available
TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability	Jun 4, 2024	BenchmarkingLanguage Modeling	CodeCode Available
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library	Oct 3, 2016	Adversarial AttackAdversarial Defense	CodeCode Available
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal Knowledge	May 8, 2025	Benchmarking	CodeCode Available
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment	Feb 13, 2023	BenchmarkingSegmentation	CodeCode Available
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image Editing	Nov 1, 2024	BenchmarkingSemantic Segmentation	CodeCode Available
TSPP: A Unified Benchmarking Tool for Time-series Forecasting	Dec 28, 2023	BenchmarkingFeature Engineering	CodeCode Available
City-Scale Road Audit System using Deep Learning	Nov 26, 2018	BenchmarkingDeep Learning	CodeCode Available
Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift	Apr 19, 2022	BenchmarkingClassification	CodeCode Available
Advancing and Benchmarking Personalized Tool Invocation for LLMs	May 7, 2025	BenchmarkingWorld Knowledge	CodeCode Available
CityNet: A Comprehensive Multi-Modal Urban Dataset for Advanced Research in Urban Computing	Jun 30, 2021	BenchmarkingTransfer Learning	CodeCode Available
Chumor 2.0: Towards Benchmarking Chinese Humor Understanding	Dec 23, 2024	Benchmarking	CodeCode Available
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs	May 26, 2025	BenchmarkingFault localization	CodeCode Available
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning	May 19, 2025	Benchmarking	CodeCode Available
Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems	Oct 14, 2023	Benchmarking	CodeCode Available
Random Machines: A bagged-weighted support vector model with free kernel choice	Nov 21, 2019	Benchmarkingregression	CodeCode Available
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions	Oct 5, 2024	BenchmarkingHallucination	CodeCode Available
ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain	Nov 23, 2024	BenchmarkingDiversity	CodeCode Available
Ranking and benchmarking framework for sampling algorithms on synthetic data streams	Jun 17, 2020	BenchmarkingHyperparameter Optimization	CodeCode Available
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules	Jun 20, 2024	Benchmarking	CodeCode Available
Tunability: Importance of Hyperparameters of Machine Learning Algorithms	Feb 26, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Temporal receptive field in dynamic graph learning: A comprehensive analysis	Jul 17, 2024	BenchmarkingDynamic Link Prediction	CodeCode Available
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability	Feb 3, 2020	BenchmarkingDiscrete Choice Models	CodeCode Available
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model	Jul 31, 2024	BenchmarkingLarge Language Model	CodeCode Available
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions	Jan 1, 2025	Benchmarking	CodeCode Available
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models	Oct 7, 2024	BenchmarkingSegmentation	CodeCode Available
RDF-star2Vec: RDF-star Graph Embeddings for Data Mining	Dec 25, 2023	BenchmarkingGraph Embedding	CodeCode Available
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Mar 22, 2025	BenchmarkingObject	CodeCode Available
An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines	Apr 2, 2021	Benchmarking	CodeCode Available
Characterizing SLAM Benchmarks and Methods for the Robust Perception Age	May 19, 2019	Benchmarking	CodeCode Available
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge	Apr 10, 2025	Adversarial RobustnessBenchmarking	CodeCode Available
Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese	May 28, 2025	Benchmarking	CodeCode Available
Changepoint Detection in Noisy Data Using a Novel Residuals Permutation-Based Method (RESPERM): Benchmarking and Application to Single Trial ERPs	Apr 21, 2022	BenchmarkingChange Point Detection	CodeCode Available
TuringQ: Benchmarking AI Comprehension in Theory of Computation	Oct 9, 2024	Benchmarking	CodeCode Available
An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations	Aug 26, 2019	BenchmarkingClustering	CodeCode Available
TweetNERD -- End to End Entity Linking Benchmark for Tweets	Oct 14, 2022	BenchmarkingEntity Linking	CodeCode Available
Real-time cryo-EM data pre-processing with Warp	Jun 14, 2018	BenchmarkingImage Reconstruction	CodeCode Available
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets	Jul 19, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
TextClass Benchmark: A Continuous Elo Rating of LLMs in Social Sciences	Nov 30, 2024	BenchmarkingClassification	CodeCode Available
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence	Apr 10, 2023	Benchmarkingspeech-recognition	CodeCode Available
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective	May 28, 2025	BenchmarkingMemorization	CodeCode Available
An Efficient Two-stage Gradient Boosting Framework for Short-term Traffic State Estimation	Feb 21, 2023	BenchmarkingState Estimation	CodeCode Available
ACCORD: Closing the Commonsense Measurability Gap	Jun 4, 2024	BenchmarkingCommon Sense Reasoning	CodeCode Available
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions	Jul 28, 2017	Autonomous VehiclesBenchmarking	CodeCode Available
Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness	Oct 28, 2023	Benchmarkingimage-classification	CodeCode Available
Benchmarking Instance-Centric Counterfactual Algorithms for XAI: From White Box to Black Box	Mar 4, 2022	Benchmarkingcounterfactual	CodeCode Available

Show:10 25 50

← PrevPage 111 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified