Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 5548 papers

Title	Date	Tasks	Status	Hype
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning	Jun 24, 2025	BenchmarkingDrug Discovery	CodeCode Available	2
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges	Jun 24, 2025	BenchmarkingCode Generation	—Unverified	0
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping	Jun 23, 2025	BenchmarkingDiversity	CodeCode Available	0
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey	Jun 23, 2025	BenchmarkingSurvey	—Unverified	0
Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions	Jun 23, 2025	BenchmarkingSensitivity	—Unverified	0
Staining normalization in histopathology: Method benchmarking using multicenter dataset	Jun 23, 2025	Benchmarking	—Unverified	0
Benchmarking Music Generation Models and Metrics via Human Preference Studies	Jun 23, 2025	BenchmarkingMusic Generation	—Unverified	0
Survey of HPC in US Research Institutions	Jun 23, 2025	BenchmarkingGPU	—Unverified	0
Statistical Multicriteria Evaluation of LLM-Generated Text	Jun 22, 2025	BenchmarkingDiversity	CodeCode Available	0
Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares	Jun 22, 2025	Benchmarkingregression	—Unverified	0
On the Robustness of Human-Object Interaction Detection against Distribution Shift	Jun 22, 2025	BenchmarkingData Augmentation	—Unverified	0
TAB: Unified Benchmarking of Time Series Anomaly Detection Methods	Jun 22, 2025	Anomaly DetectionBenchmarking	CodeCode Available	2
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking	Jun 21, 2025	BenchmarkingReinforcement Learning (RL)	—Unverified	0
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices	Jun 21, 2025	BenchmarkingCPU	CodeCode Available	1
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques	Jun 20, 2025	BenchmarkingDimensionality Reduction	—Unverified	0
Universal Music Representations? Evaluating Foundation Models on World Music Corpora	Jun 20, 2025	BenchmarkingFew-Shot Learning	CodeCode Available	0
TabArena: A Living Benchmark for Machine Learning on Tabular Data	Jun 20, 2025	Benchmarking	CodeCode Available	3
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors	Jun 19, 2025	BenchmarkingFace Swapping	—Unverified	0
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents	Jun 19, 2025	Benchmarking	—Unverified	0
Finance Language Model Evaluation (FLaME)	Jun 18, 2025	BenchmarkingLanguage Model Evaluation	—Unverified	0
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models	Jun 17, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions	Jun 17, 2025	Benchmarking	—Unverified	0
Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery	Jun 17, 2025	BenchmarkingDrug Discovery	—Unverified	0
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available	0

Show:10 25 50

← PrevPage 3 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified