Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1851–1900 of 5548 papers

Title	Date	Tasks	Status
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance	Aug 4, 2024	Action AnticipationBenchmarking	—Unverified
Consolidating and Developing Benchmarking Datasets for the Nepali Natural Language Understanding Tasks	Nov 28, 2024	BenchmarkingNatural Language Inference	—Unverified
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task	Apr 27, 2023	ArticlesBenchmarking	—Unverified
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics	Apr 21, 2022	AttributeBenchmarking	—Unverified
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods	Oct 3, 2023	Benchmarkingtext-guided-image-editing	—Unverified
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics	Sep 17, 2021	AttributeBenchmarking	—Unverified
ChatGPT Alternative Solutions: Large Language Models Survey	Mar 21, 2024	BenchmarkingChatbot	—Unverified
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets	Dec 2, 2023	Benchmarking	—Unverified
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts	May 23, 2025	Benchmarking	—Unverified
Benchmarking Deep Learning Architectures for Urban Vegetation Point Cloud Semantic Segmentation from MLS	Jun 17, 2023	BenchmarkingSegmentation	—Unverified
Context-guided Triple Matching for Multiple Choice Question Answering	Sep 27, 2021	BenchmarkingMultiple-choice	—Unverified
Context-guided Triple Matching for Multiple Choice Question Answering	Jan 16, 2022	BenchmarkingMultiple-choice	—Unverified
EdgeMark: An Automation and Benchmarking System for Embedded Artificial Intelligence Tools	Feb 3, 2025	Benchmarking	—Unverified
Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection	Jun 28, 2023	BenchmarkingDiversity	—Unverified
Exploring the Practicality of Generative Retrieval on Dynamic Corpora	May 27, 2023	BenchmarkingInformation Retrieval	—Unverified
Continuous Function Structured in Multilayer Perceptron for Global Optimization	Mar 7, 2023	Benchmarkingglobal-optimization	—Unverified
Benchmarking Deep Learning Frameworks for Automated Diagnosis of Ocular Toxoplasmosis: A Comprehensive Approach to Classification and Segmentation	May 18, 2023	BenchmarkingDiagnostic	—Unverified
Continuous-Time Gaussian Process Motion-Compensation for Event-vision Pattern Tracking with Distance Fields	Mar 5, 2023	BenchmarkingMotion Compensation	—Unverified
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack	Mar 18, 2025	8kBenchmarking	—Unverified
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation	Dec 4, 2023	BenchmarkingContrastive Learning	—Unverified
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization	Jan 22, 2025	Benchmarkingregression	—Unverified
Characterizing Transactional Databases for Frequent Itemset Mining	Nov 9, 2020	Benchmarking	—Unverified
Benchmarking and Validation of Sub-mW 30GHz VG-LNAs in 22nm FDSOI CMOS for 5G/6G Phased-Array Receivers	Sep 11, 2024	Benchmarking	—Unverified
Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices	Sep 25, 2024	Autonomous VehiclesBenchmarking	—Unverified
Characterizing the adversarial vulnerability of speech self-supervised learning	Nov 8, 2021	Adversarial RobustnessBenchmarking	—Unverified
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments	Jun 9, 2025	BenchmarkingNavigate	—Unverified
Characterizing Missing Information in Deep Networks Using Backpropagated Gradients	Jan 1, 2020	Anomaly DetectionAttribute	—Unverified
Convolutional and Deep Learning based techniques for Time Series Ordinal Classification	Jun 16, 2023	BenchmarkingOrdinal Classification	—Unverified
COPA: Comparing the Incomparable to Explore the Pareto Front	Mar 18, 2025	AutoMLBenchmarking	—Unverified
An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification	Nov 24, 2023	Benchmarkingimage-classification	—Unverified
Characterization of Multiple 3D LiDARs for Localization and Mapping using Normal Distributions Transform	Apr 3, 2020	Benchmarking	—Unverified
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective	Feb 4, 2023	BenchmarkingMultiobjective Optimization	—Unverified
EconGym: A Scalable AI Testbed with Diverse Economic Tasks	Jun 13, 2025	Benchmarking	—Unverified
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Feature Space Perspective	Sep 9, 2021	BenchmarkingMultiobjective Optimization	—Unverified
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models	Jun 16, 2022	BenchmarkingLanguage Modeling	—Unverified
Cornac: A Comparative Framework for Multimodal Recommender Systems	May 8, 2020	BenchmarkingRecommendation Systems	—Unverified
Benchmarking and Pushing the Multi-Bias Elimination Boundary of LLMs via Causal Effect Estimation-guided Debiasing	May 22, 2025	Benchmarking	—Unverified
An Optical Frontend for a Convolutional Neural Network	Dec 23, 2018	Benchmarking	—Unverified
Benchmarking and Performance Modelling of MapReduce Communication Pattern	May 23, 2020	Benchmarking	—Unverified
CoSy: Evaluating Textual Explanations of Neurons	May 30, 2024	Benchmarking	—Unverified
ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects	Nov 12, 2021	BenchmarkingCausal Inference	—Unverified
Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies	Nov 17, 2024	Benchmarking	—Unverified
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts	Apr 14, 2025	BenchmarkingObject	—Unverified
Coupling volume-excluding compartment-based models of diffusion at different scales: Voronoi and pseudo-compartment approaches	May 24, 2016	BenchmarkingBlocking	—Unverified
Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution	Jun 2, 2020	BenchmarkingDepth Map Super-Resolution	—Unverified
Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms	Sep 12, 2018	Bayesian OptimizationBenchmarking	—Unverified
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity	Jun 28, 2023	BenchmarkingImage Captioning	—Unverified
ECG-Adv-GAN: Detecting ECG Adversarial Examples with Conditional Generative Adversarial Networks	Jul 16, 2021	BenchmarkingGenerative Adversarial Network	—Unverified
ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph	Mar 20, 2025	BenchmarkingHallucination	—Unverified
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey	May 3, 2025	Autonomous DrivingBenchmarking	—Unverified

Show:10 25 50

← PrevPage 38 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified