Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3051–3100 of 5548 papers

Title	Date	Tasks	Status
Evaluating Music Recommender Systems for Groups	Jul 31, 2017	BenchmarkingRecommendation Systems	—Unverified
Evaluating Nuanced Bias in Large Language Model Free Response Answers	Jul 11, 2024	BenchmarkingLanguage Modeling	—Unverified
Evaluating Robustness of LLMs on Crisis-Related Microblogs across Events, Information Types, and Linguistic Features	Dec 8, 2024	Benchmarking	—Unverified
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning	Oct 15, 2023	BenchmarkingSpatial Reasoning	—Unverified
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance	Mar 27, 2025	BenchmarkingImage Generation	—Unverified
Evaluating the Generation of Spatial Relations in Text and Image Generative Models	Nov 12, 2024	BenchmarkingImage Generation	—Unverified
Evaluating the Performance of Large Language Models via Debates	Jun 16, 2024	Benchmarking	—Unverified
Evaluating Visual Conversational Agents via Cooperative Human-AI Games	Aug 17, 2017	Benchmarking	—Unverified
Evaluation and Ensembling of Methods for Reverse Engineering of Brain Connectivity from Imaging Data	Mar 15, 2016	BenchmarkingCausal Discovery	—Unverified
Evaluation Methodology for Attacks Against Confidence Thresholding Models	May 1, 2019	Adversarial RobustnessBenchmarking	—Unverified
Evaluation Methods and Measures for Causal Learning Algorithms	Feb 7, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge	Feb 21, 2019	AnatomyBenchmarking	—Unverified
Evaluation of Architectural Synthesis Using Generative AI	Mar 4, 2025	Benchmarking	—Unverified
Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi	Jul 15, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?	Jun 21, 2023	BenchmarkingExplainable artificial intelligence	—Unverified
Evaluation of simulation methods for tumor subclonal reconstruction	Feb 14, 2024	Benchmarking	—Unverified
Evaluation of Three Welsh Language POS Taggers	Jun 1, 2022	BenchmarkingPOS	—Unverified
EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation	Mar 24, 2025	BenchmarkingData Augmentation	—Unverified
EventAid: Benchmarking Event-aided Image/Video Enhancement Algorithms with Real-captured Hybrid Dataset	Dec 13, 2023	BenchmarkingDeblurring	—Unverified
Event-based Continuous Color Video Decompression from Single Frames	Nov 30, 2023	Benchmarking	—Unverified
Event-based Feature Extraction Using Adaptive Selection Thresholds	Jul 18, 2019	Benchmarking	—Unverified
Event Camera Simulator Design for Modeling Attention-based Inference Architectures	May 3, 2021	Benchmarking	—Unverified
Eventprop training for efficient neuromorphic applications	Mar 6, 2025	BenchmarkingGPU	—Unverified
EvEntS ReaLM: Event Reasoning of Entity States via Language Models	Nov 10, 2022	Benchmarking	—Unverified
Evetac: An Event-based Optical Tactile Sensor for Robotic Manipulation	Dec 2, 2023	Benchmarking	—Unverified
Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking	Mar 11, 2025	Benchmarking	—Unverified
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Evolutionary Multimodal Optimization: A Short Survey	Aug 3, 2015	BenchmarkingDiversity	—Unverified
Evolving Evolutionary Algorithms using Linear Genetic Programming	Aug 21, 2021	BenchmarkingEvolutionary Algorithms	—Unverified
Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms	Jan 30, 2025	BenchmarkingCombinatorial Optimization	—Unverified
EVOPS Benchmark: Evaluation of Plane Segmentation from RGBD and LiDAR Data	Apr 12, 2022	BenchmarkingSegmentation	—Unverified
Exact lattice-based stochastic cell culture simulation algorithms incorporating spontaneous and contact-dependent reactions	Aug 9, 2022	BenchmarkingCultural Vocal Bursts Intensity Prediction	—Unverified
Exact Mean Computation in Dynamic Time Warping Spaces	Oct 24, 2017	BenchmarkingDynamic Time Warping	—Unverified
EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods	May 20, 2024	BenchmarkingExplainable artificial intelligence	—Unverified
Examining convolutional feature extraction using Maximum Entropy (ME) and Signal-to-Noise Ratio (SNR) for image classification	May 10, 2021	Benchmarkingimage-classification	—Unverified
Experimental Benchmarking of Energy-saving Sub-Optimal Sliding Mode Control	Jul 14, 2024	Benchmarking	—Unverified
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor	May 22, 2025	Adversarial AttackAdversarial Robustness	—Unverified
Experimenting with robotic intra-logistics domains	Apr 26, 2018	Benchmarkingvalid	—Unverified
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists	Jun 2, 2025	BenchmarkingForm	—Unverified
Explainable AI using expressive Boolean formulas	Jun 6, 2023	BenchmarkingExplainable Artificial Intelligence (XAI)	—Unverified
Explainable Rumor Detection using Inter and Intra-feature Attention Networks	Jul 21, 2020	Benchmarking	—Unverified
Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach	May 20, 2025	Benchmarking	—Unverified
Explicitly Multi-Modal Benchmarks for Multi-Objective Optimization	Oct 7, 2021	Benchmarking	—Unverified
Exploitation-Guided Exploration for Semantic Embodied Navigation	Nov 6, 2023	Benchmarking	—Unverified
Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks	Mar 26, 2021	Benchmarking	—Unverified
Exploiting Database Management Systems and Treewidth for Counting	Jan 13, 2020	BenchmarkingManagement	—Unverified
Exploration of TPUs for AI Applications	Sep 16, 2023	BenchmarkingEdge-computing	—Unverified
Exploring and Benchmarking the Planning Capabilities of Large Language Models	Jun 18, 2024	BenchmarkingIn-Context Learning	—Unverified
Exploring Capabilities of Time Series Foundation Models in Building Analytics	Oct 28, 2024	Benchmarkingenergy management	—Unverified
Exploring Continual Learning of Diffusion Models	Mar 27, 2023	BenchmarkingContinual Learning	—Unverified

Show:10 25 50

← PrevPage 62 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified