Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5401–5450 of 5548 papers

Title	Date	Tasks	Status
Probing Acoustic Representations for Phonetic Properties	Oct 25, 2020	Benchmarkingspeech-recognition	CodeCode Available
Probing Conceptual Understanding of Large Visual-Language Models	Apr 7, 2023	Benchmarking	CodeCode Available
Probing Critical Learning Dynamics of PLMs for Hate Speech Detection	Feb 3, 2024	BenchmarkingHate Speech Detection	CodeCode Available
Using Color To Identify Insider Threats	Nov 25, 2021	Benchmarking	CodeCode Available
An Exploration of Exploration: Measuring the ability of lexicase selection to find obscure pathways to optimality	Jul 20, 2021	BenchmarkingDiagnostic	CodeCode Available
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis	Apr 17, 2023	BenchmarkingDrift Detection	CodeCode Available
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of Datasets	Sep 4, 2023	BenchmarkingMotor Imagery	CodeCode Available
Synth4bench: a framework for generating synthetic genomics data for the evaluation of tumor-only somatic variant calling algorithms	Mar 8, 2024	BenchmarkingSynthetic Data Generation	CodeCode Available
Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future Challenges	Oct 7, 2021	BenchmarkingModel extraction	CodeCode Available
Transfer Learning for Prosthetics Using Imitation Learning	Jan 15, 2019	BenchmarkingImitation Learning	CodeCode Available
Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives	Nov 13, 2018	BenchmarkingIntrusion Detection	CodeCode Available
Synthetic Datasets for Machine Learning on Spatio-Temporal Graphs using PDEs	Feb 6, 2025	BenchmarkingEpidemiology	CodeCode Available
Synthetic location trajectory generation using categorical diffusion models	Feb 19, 2024	BenchmarkingDecision Making	CodeCode Available
Synthetic Porous Microstructures: Automatic Design, Simulation, and Permeability Analysis	Feb 20, 2025	Benchmarking	CodeCode Available
Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks	May 26, 2025	BenchmarkingDecision Making Under Uncertainty	CodeCode Available
An Experimental Study of the Transferability of Spectral Graph Networks	Dec 18, 2020	BenchmarkingGeneral Classification	CodeCode Available
Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task	Mar 6, 2024	Benchmarking	CodeCode Available
Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as Science Communicators	Sep 21, 2024	Benchmarking	CodeCode Available
Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning	Oct 9, 2024	BenchmarkingFairness	CodeCode Available
Comparing Machine Learning Algorithms by Union-Free Generic Depth	Dec 20, 2023	Benchmarking	CodeCode Available
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists	Aug 30, 2024	BenchmarkingSentiment Analysis	CodeCode Available
Transformation-Interaction-Rational Representation for Symbolic Regression	Apr 25, 2022	BenchmarkingForm	CodeCode Available
Towards Enhancing Fault Tolerance in Neural Networks	Jul 6, 2019	Benchmarking	CodeCode Available
Robust Model-Based Optimization for Challenging Fitness Landscapes	May 23, 2023	Benchmarkingmodel	CodeCode Available
Air Learning: A Deep Reinforcement Learning Gym for Autonomous Aerial Robot Visual Navigation	Jun 2, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum Chemistry	Oct 15, 2024	Benchmarking	CodeCode Available
ViP: Video Platform for PyTorch	Oct 7, 2019	BenchmarkingVideo Understanding	CodeCode Available
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language	May 15, 2025	BenchmarkingOptical Character Recognition	CodeCode Available
Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification	Feb 8, 2022	Anomaly DetectionBenchmarking	CodeCode Available
Towards Efficient Synchronous Federated Training: A Survey on System Optimization Strategies	Sep 9, 2021	BenchmarkingFederated Learning	CodeCode Available
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control Tasks	Oct 25, 2021	Benchmarkingcontinuous-control	CodeCode Available
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation	Dec 4, 2020	BenchmarkingMachine Translation	CodeCode Available
Comparative Analysis: Violence Recognition from Videos using Transfer Learning	Aug 26, 2024	Action RecognitionBenchmarking	CodeCode Available
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis	Mar 18, 2025	BenchmarkingDrug Response Prediction	CodeCode Available
Compact Trilinear Interaction for Visual Question Answering	Sep 26, 2019	BenchmarkingKnowledge Distillation	CodeCode Available
Benchmarking Classic and Learned Navigation in Complex 3D Environments	Jan 30, 2019	Benchmarking	CodeCode Available
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data	Dec 6, 2024	BenchmarkingImputation	CodeCode Available
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models	Jun 15, 2024	BenchmarkingData Augmentation	CodeCode Available
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models	Feb 23, 2025	BenchmarkingSpatial Reasoning	CodeCode Available
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance	Jan 17, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available
CODES: Benchmarking Coupled ODE Surrogates	Oct 28, 2024	BenchmarkingUncertainty Quantification	CodeCode Available
CodeS: Towards Code Model Generalization Under Distribution Shift	Jun 11, 2022	BenchmarkingCode Classification	CodeCode Available
Code Ownership in Open-Source AI Software Security	Dec 18, 2023	Benchmarking	CodeCode Available
Benchmarking ChatGPT on Algorithmic Reasoning	Apr 4, 2024	Benchmarking	CodeCode Available
COCO: Performance Assessment	May 11, 2016	Benchmarking	CodeCode Available
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)	Apr 5, 2024	Benchmarking	CodeCode Available
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology	Apr 24, 2023	BenchmarkingDecision Making	CodeCode Available
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs	May 21, 2025	BenchmarkingQuestion Answering	CodeCode Available

Show:10 25 50

← PrevPage 109 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified