Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3951–4000 of 5548 papers

Title	Date	Tasks	Status
Decisions and Performance Under Bounded Rationality: A Computational Benchmarking Approach	May 26, 2020	BenchmarkingDecision Making	—Unverified
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share	Jan 27, 2025	BenchmarkingTransfer Learning	—Unverified
What Will it Take to Fix Benchmarking in Natural Language Understanding?	Apr 5, 2021	BenchmarkingNatural Language Understanding	—Unverified
Transformed Subspace Clustering	Dec 10, 2019	BenchmarkingClustering	—Unverified
On the Evaluation of Speech Foundation Models for Spoken Language Understanding	Jun 14, 2024	BenchmarkingPrediction	—Unverified
On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel	Aug 1, 2022	Benchmarkingimage-classification	—Unverified
Transformers in Protein: A Survey	May 26, 2025	BenchmarkingDrug Discovery	—Unverified
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics	Apr 21, 2022	AttributeBenchmarking	—Unverified
On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks	Apr 29, 2024	BenchmarkingFederated Learning	—Unverified
Broadening the Scope of Neural Network Potentials through Direct Inclusion of Additional Molecular Attributes	Mar 22, 2024	Benchmarking	—Unverified
On the Interaction of Belief Bias and Explanations	Jun 29, 2021	Benchmarking	—Unverified
Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark	May 16, 2025	Anomaly DetectionBenchmarking	—Unverified
On the Performance of Multimodal Language Models	Oct 4, 2023	BenchmarkingBinary Classification	—Unverified
On the Potential of Large Language Models to Solve Semantics-Aware Process Mining Tasks	Apr 29, 2025	Anomaly DetectionBenchmarking	—Unverified
On the project risk baseline: integrating aleatory uncertainty into project scheduling	May 31, 2024	BenchmarkingScheduling	—Unverified
On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild	Jul 17, 2023	BenchmarkingReal-Time Semantic Segmentation	—Unverified
On the reduction of Linear Parameter-Varying State-Space models	Apr 2, 2024	BenchmarkingDimensionality Reduction	—Unverified
On the relationship between Benchmarking, Standards and Certification in Robotics and AI	Sep 21, 2023	Benchmarking	—Unverified
On the Reliability and Validity of Detecting Approval of Political Actors in Tweets	Nov 1, 2020	BenchmarkingSentiment Analysis	—Unverified
On the Robustness of Human-Object Interaction Detection against Distribution Shift	Jun 22, 2025	BenchmarkingData Augmentation	—Unverified
On the role of benchmarking data sets and simulations in method comparison studies	Aug 2, 2022	Benchmarking	—Unverified
Optimizer Benchmarking Needs to Account for Hyperparameter Tuning	Oct 25, 2019	Benchmarking	—Unverified
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends	Oct 5, 2024	BenchmarkingChart Understanding	—Unverified
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning	Oct 14, 2024	Atari GamesBenchmarking	—Unverified
Translation Canvas: An Explainable Interface to Pinpoint and Analyze Translation Systems	Oct 7, 2024	BenchmarkingMachine Translation	—Unverified
On the Use of Quality Diversity Algorithms for The Traveling Thief Problem	Dec 16, 2021	BenchmarkingDiversity	—Unverified
On the Utility of Equivariance and Symmetry Breaking in Deep Learning Architectures on Point Clouds	Jan 1, 2025	Benchmarking	—Unverified
On the Value of ML Models	Dec 13, 2021	Benchmarking	—Unverified
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation	Jul 1, 2025	BenchmarkingMachine Translation	—Unverified
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving	Dec 6, 2024	Autonomous DrivingBenchmarking	—Unverified
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images	Apr 17, 2023	3D Pose EstimationBenchmarking	—Unverified
OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations	Dec 3, 2024	BenchmarkingFace Recognition	—Unverified
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics	Sep 17, 2021	AttributeBenchmarking	—Unverified
OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking	May 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers	Sep 11, 2024	Benchmarking	—Unverified
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing	May 22, 2025	Benchmarking	—Unverified
Benchmarking and Performance Modelling of MapReduce Communication Pattern	May 23, 2020	Benchmarking	—Unverified
TransOpt: Transformer-based Representation Learning for Optimization Problem Classification	Nov 29, 2023	BenchmarkingClassification	—Unverified
Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms	Sep 12, 2018	Bayesian OptimizationBenchmarking	—Unverified
Open-CD: A Comprehensive Toolbox for Change Detection	Jul 22, 2024	BenchmarkingChange Detection	—Unverified
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation	Dec 15, 2024	3D GenerationBenchmarking	—Unverified
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI	Apr 4, 2023	Benchmarking	—Unverified
Open Datasets for Satellite Radio Resource Control	Apr 22, 2024	BenchmarkingDecision Making	—Unverified
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors	Sep 29, 2023	BenchmarkingComputational Efficiency	—Unverified
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation	Apr 18, 2025	Benchmarking	—Unverified
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models	Jan 9, 2024	Benchmarking	—Unverified
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets	May 16, 2025	BenchmarkingKnowledge Graphs	—Unverified
OpenDPD: An Open-Source End-to-End Learning & Benchmarking Framework for Wideband Power Amplifier Modeling and Digital Pre-Distortion	Jan 16, 2024	Benchmarking	—Unverified
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety	Mar 18, 2024	BenchmarkingMathematical Reasoning	—Unverified
Benchmarking and Improving Generator-Validator Consistency of Language Models	Oct 3, 2023	BenchmarkingInstruction Following	—Unverified

Show:10 25 50

← PrevPage 80 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified