Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4251–4300 of 5548 papers

Title	Date	Tasks	Status
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet	Apr 2, 2025	Benchmarking	—Unverified
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Sep 25, 2024	BenchmarkingFormal Logic	—Unverified
A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design	May 19, 2025	BenchmarkingDrug Discovery	—Unverified
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation	Feb 10, 2024	BenchmarkingLanguage Modeling	—Unverified
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis	Jan 21, 2021	Benchmarking	—Unverified
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking	May 13, 2022	Benchmarkingreinforcement-learning	—Unverified
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library	Aug 20, 2024	BenchmarkingComputational Efficiency	—Unverified
Automatic detection of passable roads after floods in remote sensed and social media data	Jan 10, 2019	BenchmarkingTransfer Learning	—Unverified
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice	Feb 28, 2025	BenchmarkingDiagnostic	—Unverified
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents	Jan 3, 2025	Benchmarking	—Unverified
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified
Automated Structured Radiology Report Generation	May 30, 2025	Benchmarking	—Unverified
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration	Jun 9, 2023	BenchmarkingTime Series	—Unverified
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified
Automated Machine Learning on Big Data using Stochastic Algorithm Tuning	Jul 30, 2014	Bayesian OptimisationBenchmarking	—Unverified
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing	Jun 27, 2023	Benchmarking	—Unverified
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension	Dec 16, 2024	BenchmarkingImage Captioning	—Unverified
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models	Dec 30, 2023	Benchmarkingimage-classification	—Unverified
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A	Jun 1, 2015	BenchmarkingFace Detection	—Unverified
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models	Apr 1, 2025	BenchmarkingConversational Question Answering	—Unverified
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem	Jul 13, 2024	BenchmarkingDeep Learning	—Unverified
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)	Nov 23, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy	Mar 18, 2021	Art AnalysisBenchmarking	—Unverified
AutoLay: Benchmarking amodal layout estimation for autonomous driving	Aug 20, 2021	Amodal Layout EstimationAutonomous Driving	—Unverified
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case	Jun 16, 2022	BenchmarkingDensity Estimation	—Unverified
Python Random Graph Generator	Sep 20, 2017	BenchmarkingGraph Generation	—Unverified
Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery	Jun 17, 2025	BenchmarkingDrug Discovery	—Unverified
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs	Sep 30, 2024	BenchmarkingMultiple-choice	—Unverified
AutoAI-TS: AutoAI for Time Series Forecasting	Feb 24, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
QDA^2: A principled approach to automatically annotating charge stability diagrams	Dec 18, 2023	Benchmarking	—Unverified
A Universal Protocol to Benchmark Camera Calibration for Sports	Apr 15, 2024	BenchmarkingCamera Calibration	—Unverified
A Unified Taylor Framework for Revisiting Attribution Methods	Aug 21, 2020	BenchmarkingDecision Making	—Unverified
A Complementarity Analysis of the COCO Benchmark Problems and Artificially Generated Problems	Apr 27, 2021	Benchmarking	—Unverified
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges	Jun 24, 2025	BenchmarkingCode Generation	—Unverified
A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation	Nov 9, 2016	BenchmarkingTranslation	—Unverified
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning	Aug 20, 2024	BenchmarkingLanguage Modelling	—Unverified
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module	Aug 8, 2022	Benchmarkingobject-detection	—Unverified
Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs	Feb 24, 2024	BenchmarkingKnowledge Graphs	—Unverified
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation	May 8, 2025	BenchmarkingFederated Learning	—Unverified
Unbounded Bayesian Optimization via Regularization	Aug 14, 2015	Bayesian OptimizationBenchmarking	—Unverified
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling	Sep 24, 2024	ArticlesBenchmarking	—Unverified
Quality Assessment of Low Light Restored Images: A Subjective Study and an Unsupervised Model	Feb 4, 2022	BenchmarkingContrastive Learning	—Unverified
Quality Assured: Rethinking Annotation Strategies in Imaging AI	Jul 24, 2024	Benchmarking	—Unverified
Quality at the Tail of Machine Learning Inference	Dec 25, 2022	Autonomous DrivingBenchmarking	—Unverified
Uncertainty estimation for Cross-dataset performance in Trajectory prediction	May 15, 2022	BenchmarkingPrediction	—Unverified
A Unified Study of Machine Learning Explanation Evaluation Metrics	Mar 27, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
QuantBench: Benchmarking AI Methods for Quantitative Investment	Apr 24, 2025	BenchmarkingContinual Learning	—Unverified
Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling	Dec 15, 2020	BenchmarkingDeep Learning	—Unverified

Show:10 25 50

← PrevPage 86 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified