Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2001–2050 of 5548 papers

Title	Date	Tasks	Status
Benchmarking and Improving Generator-Validator Consistency of Language Models	Oct 3, 2023	BenchmarkingInstruction Following	—Unverified
Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors	Jun 21, 2024	Adversarial DefenseAdversarial Robustness	—Unverified
Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design	Oct 15, 2020	BenchmarkingDecision Making	—Unverified
Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)	Feb 9, 2025	BenchmarkingCPU	—Unverified
Decoding the Diversity: A Review of the Indic AI Research Landscape	Jun 13, 2024	BenchmarkingDiversity	—Unverified
Certifying almost all quantum states with few single-qubit measurements	Apr 10, 2024	AllBenchmarking	—Unverified
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines	Dec 1, 2021	Adversarial RobustnessBenchmarking	—Unverified
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks	Sep 3, 2019	Benchmarkingspeech-recognition	—Unverified
CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs	Oct 22, 2020	BenchmarkingCell Segmentation	—Unverified
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech	Jun 9, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Establishing Reliability Metrics for Reward Models in Large Language Models	Apr 21, 2025	Benchmarking	—Unverified
Deep Convolutional Generative Adversarial Network Based Food Recognition Using Partially Labeled Data	Dec 26, 2018	BenchmarkingFood Recognition	—Unverified
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark	Jul 1, 2019	BenchmarkingObject Tracking	—Unverified
Deep Crowd Anomaly Detection: State-of-the-Art, Challenges, and Future Research Directions	Oct 25, 2022	Anomaly DetectionBenchmarking	—Unverified
An efficiency analysis of Spanish airports	Nov 8, 2023	Benchmarking	—Unverified
Deep Diffusion Models and Unsupervised Hyperspectral Unmixing for Realistic Abundance Map Synthesis	Jun 16, 2025	BenchmarkingData Augmentation	—Unverified
Estimating Task Completion Times for Network Rollouts using Statistical Models within Partitioning-based Regression Methods	Nov 20, 2022	Benchmarkingregression	—Unverified
DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices	Aug 21, 2021	BenchmarkingEdge-computing	—Unverified
Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum Devices	Feb 10, 2024	Benchmarking	—Unverified
Deeper Insights into the Robustness of ViTs towards Common Corruptions	Apr 26, 2022	BenchmarkingData Augmentation	—Unverified
DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection	Jun 6, 2025	BenchmarkingDeepFake Detection	—Unverified
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding	May 26, 2025	Benchmarking	—Unverified
Evaluating Cultural and Social Awareness of LLM Web Agents	Oct 30, 2024	BenchmarkingNavigate	—Unverified
Evaluating the Performance of Large Language Models via Debates	Jun 16, 2024	Benchmarking	—Unverified
Deep Generative Models for Physiological Signals: A Systematic Literature Review	Jul 12, 2023	BenchmarkingEEG	—Unverified
Deep Hedging of Long-Term Financial Derivatives	Jul 29, 2020	BenchmarkingDeep Reinforcement Learning	—Unverified
Evolutionary Multimodal Optimization: A Short Survey	Aug 3, 2015	BenchmarkingDiversity	—Unverified
Deep Imputation of Missing Values in Time Series Health Data: A Review with Benchmarking	Feb 10, 2023	BenchmarkingDeep Learning	—Unverified
CayleyPy RL: Pathfinding and Reinforcement Learning on Cayley Graphs	Feb 25, 2025	Benchmarkingreinforcement-learning	—Unverified
Deep Learning and Knowledge-Based Methods for Computer Aided Molecular Design -- Toward a Unified Approach: State-of-the-Art and Future Directions	May 18, 2020	BenchmarkingDeep Learning	—Unverified
Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop	Jul 14, 2025	Benchmarking	—Unverified
An EEG-based Stereoscopic Research to Reveal the Brain's Response to What Happens Before and After Watching 2D and 3D Movies	Mar 13, 2019	BenchmarkingEEG	—Unverified
Deep learning for action spotting in association football videos	Oct 2, 2024	Action SpottingBenchmarking	—Unverified
CausalRivers -- Scaling up benchmarking of causal discovery for real-world time-series	Mar 21, 2025	Anomaly DetectionBenchmarking	—Unverified
Deep learning for extracting protein-protein interactions from biomedical literature	Jun 5, 2017	BenchmarkingCross-corpus	—Unverified
Deep learning for molecular design - a review of the state of the art	Mar 11, 2019	Benchmarkingreinforcement-learning	—Unverified
Optimal Design of Volt/VAR Control Rules of Inverters using Deep Learning	Nov 17, 2022	BenchmarkingUnity	—Unverified
Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions	Jun 25, 2020	BenchmarkingDrug Discovery	—Unverified
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation	Jul 17, 2017	BenchmarkingPose Estimation	—Unverified
Causal Reasoning Meets Visual Representation Learning: A Prospective Study	Apr 26, 2022	BenchmarkingOut-of-Distribution Generalization	—Unverified
Benchmarking and Enhancing Surgical Phase Recognition Models for Robotic-Assisted Esophagectomy	Dec 5, 2024	BenchmarkingDecoder	—Unverified
Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment	Sep 29, 2021	Atari GamesBenchmarking	—Unverified
Deep Learning vs. Gradient Boosting: Benchmarking state-of-the-art machine learning algorithms for credit scoring	May 21, 2022	BenchmarkingBinary Classification	—Unverified
Deeply Supervised Depth Map Super-Resolution as Novel View Synthesis	Aug 27, 2018	BenchmarkingBlocking	—Unverified
Benchmarking Graph Learning for Drug-Drug Interaction Prediction	Oct 24, 2024	BenchmarkingGraph Learning	—Unverified
Deep Nets: What have they ever done for Vision?	May 10, 2018	Benchmarking	—Unverified
An Early Warning Sign of Critical Transition in The Antarctic Ice Sheet -- A Data Driven Tool for Spatiotemporal Tipping Point	Apr 21, 2020	BenchmarkingClustering	—Unverified
A Dataset for Movie Description	Jan 12, 2015	BenchmarkingDescriptive	—Unverified
Benchmarking and Enhancing Disentanglement in Concept-Residual Models	Nov 30, 2023	BenchmarkingDisentanglement	—Unverified
EnzChemRED, a rich enzyme chemistry relation extraction dataset	Apr 22, 2024	Benchmarkingnamed-entity-recognition	—Unverified

Show:10 25 50

← PrevPage 41 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified