Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4301–4350 of 5548 papers

Title	Date	Tasks	Status
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI	Jan 13, 2025	ARCBenchmarking	—Unverified
Quantifying Social Biases Using Templates is Unreliable	Oct 9, 2022	AttributeBenchmarking	—Unverified
Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction	May 28, 2020	BenchmarkingPrediction	—Unverified
Quantifying the Impact of Boundary Constraint Handling Methods on Differential Evolution	May 14, 2021	Benchmarking	—Unverified
A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification	Feb 14, 2020	BenchmarkingClassification	—Unverified
Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology	Jun 24, 2025	Anomaly DetectionArtifact Detection	—Unverified
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking	May 26, 2025	BenchmarkingOptical Flow Estimation	—Unverified
Quantitative evaluation of brain-inspired vision sensors in high-speed robotic perception	Apr 27, 2025	BenchmarkingEvent-based vision	—Unverified
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values	Jun 5, 2025	Benchmarking	—Unverified
Understanding Foundation Models: Are We Back in 1924?	Sep 11, 2024	Benchmarking	—Unverified
Quantitative Metrics for Benchmarking Medical Image Harmonization	Feb 6, 2024	AnatomyBenchmarking	—Unverified
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models	Feb 21, 2024	BenchmarkingImage to text	—Unverified
Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models	Sep 8, 2016	BenchmarkingBIG-bench Machine Learning	—Unverified
Understanding or Manipulation: Rethinking Online Performance Gains of Modern Recommender Systems	Oct 11, 2022	BenchmarkingRecommendation Systems	—Unverified
Quantum classification of the MNIST dataset with Slow Feature Analysis	May 22, 2018	BenchmarkingClassification	—Unverified
Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis	Jan 12, 2021	BenchmarkingDecision Making	—Unverified
A Comparison of Directional Distances for Hand Pose Estimation	Apr 3, 2017	BenchmarkingHand Pose Estimation	—Unverified
Quantum Kernel Methods under Scrutiny: A Benchmarking Study	Sep 6, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting	Oct 25, 2023	BenchmarkingHyperparameter Optimization	—Unverified
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact	Sep 17, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Quantum-tunnelling deep neural network for optical illusion recognition	Jun 26, 2024	Autonomous VehiclesBenchmarking	—Unverified
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture	Jan 3, 2025	BenchmarkingQuestion Answering	—Unverified
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach	Apr 2, 2024	BenchmarkingCommon Sense Reasoning	—Unverified
Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets	Oct 6, 2018	BenchmarkingLanguage Modeling	—Unverified
Yet Another ADNI Machine Learning Paper? Paving The Way Towards Fully-reproducible Research on Classification of Alzheimer's Disease	Sep 21, 2017	BenchmarkingClassification	—Unverified
Understanding the Limits of Lifelong Knowledge Editing in LLMs	Mar 7, 2025	Benchmarkingknowledge editing	—Unverified
Who Wins the Game of Thrones? How Sentiments Improve the Prediction of Candidate Choice	Feb 29, 2020	BenchmarkingHoldout Set	—Unverified
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective	Jun 19, 2024	BenchmarkingContinual Pretraining	—Unverified
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture	Apr 21, 2025	Benchmarkingclass-incremental learning	—Unverified
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain	Oct 31, 2023	BenchmarkingDiagnostic	—Unverified
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models	Jun 3, 2024	BenchmarkingCode Completion	—Unverified
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests	May 23, 2023	BenchmarkingLanguage Modeling	—Unverified
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation	May 29, 2025	BenchmarkingImage Generation	—Unverified
R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery	Jul 12, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking	Sep 29, 2021	BenchmarkingMulti-Task Learning	—Unverified
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR	Nov 23, 2021	BenchmarkingComputed Tomography (CT)	—Unverified
A tutorial on multi-view autoencoders using the multi-view-AE library	Mar 12, 2024	Benchmarking	—Unverified
Understanding the User: An Intent-Based Ranking Dataset	Aug 30, 2024	BenchmarkingInformation Retrieval	—Unverified
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems	Jun 25, 2024	BenchmarkingRAG	—Unverified
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking	Jan 8, 2024	BenchmarkingContrastive Learning	—Unverified
A Theory of Dynamic Benchmarks	Oct 6, 2022	Benchmarking	—Unverified
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Jan 22, 2025	BenchmarkingHallucination	—Unverified
Rail-5k: a Real-World Dataset for Rail Surface Defects Detection	Jun 28, 2021	4kBenchmarking	—Unverified
On the Evaluation of Engineering Artificial General Intelligence	May 15, 2025	Benchmarking	—Unverified
A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality	Apr 5, 2022	BenchmarkingSelf-Supervised Learning	—Unverified
RAN-GNNs: breaking the capacity limits of graph neural networks	Mar 29, 2021	AttributeBenchmarking	—Unverified
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified
A Comparison of Cryptocurrency Volatility-benchmarking New and Mature Asset Classes	Apr 7, 2024	Benchmarking	—Unverified
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games	Aug 28, 2024	Atari GamesBenchmarking	—Unverified

Show:10 25 50

← PrevPage 87 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified