Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3751–3800 of 5548 papers

Title	Date	Tasks	Status
Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery	Jun 17, 2025	BenchmarkingDrug Discovery	—Unverified
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified
QDA^2: A principled approach to automatically annotating charge stability diagrams	Dec 18, 2023	Benchmarking	—Unverified
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges	Jun 24, 2025	BenchmarkingCode Generation	—Unverified
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning	Aug 20, 2024	BenchmarkingLanguage Modelling	—Unverified
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module	Aug 8, 2022	Benchmarkingobject-detection	—Unverified
Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs	Feb 24, 2024	BenchmarkingKnowledge Graphs	—Unverified
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation	May 8, 2025	BenchmarkingFederated Learning	—Unverified
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling	Sep 24, 2024	ArticlesBenchmarking	—Unverified
Quality Assessment of Low Light Restored Images: A Subjective Study and an Unsupervised Model	Feb 4, 2022	BenchmarkingContrastive Learning	—Unverified
Quality Assured: Rethinking Annotation Strategies in Imaging AI	Jul 24, 2024	Benchmarking	—Unverified
Quality at the Tail of Machine Learning Inference	Dec 25, 2022	Autonomous DrivingBenchmarking	—Unverified
QuantBench: Benchmarking AI Methods for Quantitative Investment	Apr 24, 2025	BenchmarkingContinual Learning	—Unverified
Quantifying Social Biases Using Templates is Unreliable	Oct 9, 2022	AttributeBenchmarking	—Unverified
Quantifying the Complexity of Standard Benchmarking Datasets for Long-Term Human Trajectory Prediction	May 28, 2020	BenchmarkingPrediction	—Unverified
Quantifying the Impact of Boundary Constraint Handling Methods on Differential Evolution	May 14, 2021	Benchmarking	—Unverified
Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology	Jun 24, 2025	Anomaly DetectionArtifact Detection	—Unverified
Quantitative evaluation of brain-inspired vision sensors in high-speed robotic perception	Apr 27, 2025	BenchmarkingEvent-based vision	—Unverified
Quantitative Metrics for Benchmarking Medical Image Harmonization	Feb 6, 2024	AnatomyBenchmarking	—Unverified
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified
Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models	Sep 8, 2016	BenchmarkingBIG-bench Machine Learning	—Unverified
Quantum classification of the MNIST dataset with Slow Feature Analysis	May 22, 2018	BenchmarkingClassification	—Unverified
Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis	Jan 12, 2021	BenchmarkingDecision Making	—Unverified
Quantum Kernel Methods under Scrutiny: A Benchmarking Study	Sep 6, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting	Oct 25, 2023	BenchmarkingHyperparameter Optimization	—Unverified
Quantum Kernel Learning for Small Dataset Modeling in Semiconductor Fabrication: Application to Ohmic Contact	Sep 17, 2024	BenchmarkingQuantum Machine Learning	—Unverified
Quantum-tunnelling deep neural network for optical illusion recognition	Jun 26, 2024	Autonomous VehiclesBenchmarking	—Unverified
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture	Jan 3, 2025	BenchmarkingQuestion Answering	—Unverified
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models	Jun 3, 2024	BenchmarkingCode Completion	—Unverified
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests	May 23, 2023	BenchmarkingLanguage Modeling	—Unverified
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation	May 29, 2025	BenchmarkingImage Generation	—Unverified
R3L: Connecting Deep Reinforcement Learning to Recurrent Neural Networks for Image Denoising via Residual Recovery	Jul 12, 2021	BenchmarkingDeep Reinforcement Learning	—Unverified
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR	Nov 23, 2021	BenchmarkingComputed Tomography (CT)	—Unverified
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems	Jun 25, 2024	BenchmarkingRAG	—Unverified
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Jan 22, 2025	BenchmarkingHallucination	—Unverified
Rail-5k: a Real-World Dataset for Rail Surface Defects Detection	Jun 28, 2021	4kBenchmarking	—Unverified
RAN-GNNs: breaking the capacity limits of graph neural networks	Mar 29, 2021	AttributeBenchmarking	—Unverified
Ransomware Detection Using Machine Learning in the Linux Kernel	Sep 10, 2024	Benchmarking	—Unverified
RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration	Apr 9, 2025	3D Semantic SegmentationBenchmarking	—Unverified
RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks	Sep 9, 2024	BenchmarkingClick-Through Rate Prediction	—Unverified
RCC-GAN: Regularized Compound Conditional GAN for Large-Scale Tabular Data Synthesis	May 24, 2022	BenchmarkingGenerative Adversarial Network	—Unverified
RDBench: ML Benchmark for Relational Databases	Oct 25, 2023	Benchmarking	—Unverified
RD-Suite: A Benchmark for Ranking Distillation	Jun 7, 2023	Benchmarking	—Unverified
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results	Jun 15, 2024	BenchmarkingHumanEval	—Unverified
RealCause: Realistic Causal Inference Benchmarking	Nov 30, 2020	BenchmarkingCausal Inference	—Unverified
Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection	Jul 19, 2024	BenchmarkingModel Selection	—Unverified
Realistic Hair Simulation Using Image Blending	Apr 19, 2019	BenchmarkingData Augmentation	—Unverified
Realistic Video Summarization through VISIOCITY: A New Benchmark and Evaluation Framework	Jul 29, 2020	BenchmarkingVideo Summarization	—Unverified
Real Time Egocentric Object Segmentation: THU-READ Labeling and Benchmarking Results	Jun 9, 2021	BenchmarkingMixed Reality	—Unverified
Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset	Feb 24, 2020	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 76 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified