Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5201–5250 of 5548 papers

Title	Date	Tasks	Status
Efficiently Quantifying Individual Agent Importance in Cooperative MARL	Dec 13, 2023	BenchmarkingMulti-agent Reinforcement Learning	—Unverified
SysML'19 demo: customizable and reusable Collective Knowledge pipelines to automate and reproduce machine learning experiments	Mar 31, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency	Jul 1, 2023	BenchmarkingData Augmentation	—Unverified
Class-agnostic Object Detection	Nov 28, 2020	BenchmarkingClass-agnostic Object Detection	—Unverified
Efficient Processing of Deep Neural Networks: A Tutorial and Survey	Mar 27, 2017	Benchmarkingspeech-recognition	—Unverified
Systematic Comparison of Path Planning Algorithms using PathBench	Mar 7, 2022	Benchmarking	—Unverified
Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification	Sep 12, 2024	BenchmarkingClassification	—Unverified
EfficientSRFace: An Efficient Network with Super-Resolution Enhancement for Accurate Face Detection	Jun 4, 2023	BenchmarkingFace Detection	—Unverified
Efficient Training of Deep Classifiers for Wireless Source Identification using Test SNR Estimates	Dec 26, 2019	Benchmarking	—Unverified
A Line-of-Sight Channel Model for the 100-450 Gigahertz Frequency Band	Feb 12, 2020	Benchmarking	—Unverified
Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles	May 4, 2024	Anomaly DetectionArticles	—Unverified
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives	Apr 15, 2025	Benchmarking	—Unverified
Egocentric Human-Object Interaction Detection: A New Benchmark and Method	Jun 17, 2025	BenchmarkingHuman-Object Interaction Detection	—Unverified
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering	Aug 1, 2023	BenchmarkingClustering	—Unverified
CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities	May 2, 2024	BenchmarkingManagement	—Unverified
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Sep 3, 2024	BenchmarkingMixed Reality	—Unverified
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry	Jan 26, 2025	BenchmarkingObject Detection	—Unverified
EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations	Oct 3, 2023	Atomic ForcesBenchmarking	—Unverified
CIMLA: Interpretable AI for inference of differential causal networks	Apr 25, 2023	Benchmarking	—Unverified
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis	Oct 6, 2023	BenchmarkingDomain Generalization	—Unverified
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"	Feb 10, 2019	BenchmarkingClustering	—Unverified
ELSA: Evaluating Localization of Social Activities in Urban Streets using Open-Vocabulary Detection	Jun 3, 2024	Action RecognitionBenchmarking	—Unverified
Embarrassingly Simple Scribble Supervision for 3D Medical Segmentation	Mar 19, 2024	BenchmarkingSegmentation	—Unverified
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data	Sep 20, 2024	BenchmarkingLanguage Modeling	—Unverified
Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework	Apr 5, 2017	BenchmarkingBoard Games	—Unverified
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Feb 13, 2025	Benchmarking	—Unverified
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools	Jan 1, 2025	Benchmarking	—Unverified
Emerging Approaches for THz Array Imaging: A Tutorial Review and Software Tool	Sep 16, 2023	BenchmarkingImage Super-Resolution	—Unverified
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices	Jan 1, 2025	Benchmarking	—Unverified
ChemTime: Rapid and Early Classification for Multivariate Time Series Classification of Chemical Sensors	Dec 15, 2023	BenchmarkingClassification	—Unverified
Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description	Oct 2, 2024	BenchmarkingFacial expression generation	—Unverified
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models	Feb 6, 2025	BenchmarkingEmotional Intelligence	—Unverified
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models	May 18, 2025	ArticlesBenchmarking	—Unverified
Emotion Analysis of Tweets Banning Education in Afghanistan	Jun 28, 2023	BenchmarkingEmotion Classification	—Unverified
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task	Apr 27, 2023	ArticlesBenchmarking	—Unverified
Empirical Analysis of Privacy-Fairness-Accuracy Trade-offs in Federated Learning: A Step Towards Responsible AI	Mar 20, 2025	BenchmarkingFairness	—Unverified
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler	Apr 24, 2024	Benchmarking	—Unverified
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices	Jun 6, 2024	BenchmarkingRAG	—Unverified
Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification	Apr 23, 2023	BenchmarkingData Augmentation	—Unverified
ChatGPT Alternative Solutions: Large Language Models Survey	Mar 21, 2024	BenchmarkingChatbot	—Unverified
SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference	May 19, 2025	BenchmarkingEEG	—Unverified
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts	Dec 5, 2024	BenchmarkingImage Generation	—Unverified
Enabling Accelerators for Graph Computing	Dec 16, 2023	Benchmarking	—Unverified
Automated Machine Learning: A Case Study on Non-Intrusive Appliance Load Monitoring	Mar 6, 2022	AutoMLBayesian Optimization	—Unverified
Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design	Mar 25, 2021	BenchmarkingEdge-computing	—Unverified
Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts	May 23, 2025	Benchmarking	—Unverified
EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting	Jul 1, 2024	3D ReconstructionBenchmarking	—Unverified
CHaRNet: Conditioned Heatmap Regression for Robust Dental Landmark Localization	Jan 22, 2025	Benchmarkingregression	—Unverified
Characterizing Transactional Databases for Frequent Itemset Mining	Nov 9, 2020	Benchmarking	—Unverified
1-D Convlutional Neural Networks for the Analysis of Pupil Size Variations in Scotopic Conditions	Feb 6, 2020	BenchmarkingBinary Classification	—Unverified

Show:10 25 50

← PrevPage 105 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified