Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4251–4275 of 5548 papers

Title	Date	Tasks	Status
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet	Apr 2, 2025	Benchmarking	—Unverified
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Sep 25, 2024	BenchmarkingFormal Logic	—Unverified
A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design	May 19, 2025	BenchmarkingDrug Discovery	—Unverified
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation	Feb 10, 2024	BenchmarkingLanguage Modeling	—Unverified
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis	Jan 21, 2021	Benchmarking	—Unverified
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking	May 13, 2022	Benchmarkingreinforcement-learning	—Unverified
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library	Aug 20, 2024	BenchmarkingComputational Efficiency	—Unverified
Automatic detection of passable roads after floods in remote sensed and social media data	Jan 10, 2019	BenchmarkingTransfer Learning	—Unverified
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice	Feb 28, 2025	BenchmarkingDiagnostic	—Unverified
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents	Jan 3, 2025	Benchmarking	—Unverified
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms	Oct 11, 2023	BenchmarkingDenoising	—Unverified
Automated Structured Radiology Report Generation	May 30, 2025	Benchmarking	—Unverified
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration	Jun 9, 2023	BenchmarkingTime Series	—Unverified
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified
Automated Machine Learning on Big Data using Stochastic Algorithm Tuning	Jul 30, 2014	Bayesian OptimisationBenchmarking	—Unverified
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing	Jun 27, 2023	Benchmarking	—Unverified
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension	Dec 16, 2024	BenchmarkingImage Captioning	—Unverified
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models	Dec 30, 2023	Benchmarkingimage-classification	—Unverified
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A	Jun 1, 2015	BenchmarkingFace Detection	—Unverified
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models	Apr 1, 2025	BenchmarkingConversational Question Answering	—Unverified
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem	Jul 13, 2024	BenchmarkingDeep Learning	—Unverified
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)	Nov 23, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy	Mar 18, 2021	Art AnalysisBenchmarking	—Unverified

Show:10 25 50

← PrevPage 171 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified