Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1526–1550 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Deep Learning and Vision Foundation Models for Atypical vs. Normal Mitosis Classification with Cross-Dataset Evaluation	Jun 26, 2025	BenchmarkingTransfer Learning	CodeCode Available
mTSBench: Benchmarking Multivariate Time Series Anomaly Detection and Model Selection at Scale	Jun 26, 2025	Anomaly DetectionBenchmarking	CodeCode Available
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge	Jun 26, 2025	Benchmarking	—Unverified
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation	Jun 26, 2025	AttributeBenchmarking	—Unverified
FixCLR: Negative-Class Contrastive Learning for Semi-Supervised Domain Generalization	Jun 25, 2025	BenchmarkingContrastive Learning	—Unverified
Benchmarking Unsupervised Strategies for Anomaly Detection in Multivariate Time Series	Jun 25, 2025	Anomaly DetectionBenchmarking	CodeCode Available
Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision	Jun 25, 2025	BenchmarkingInformation Retrieval	—Unverified
scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection	Jun 25, 2025	BenchmarkingContrastive Learning	—Unverified
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression	Jun 25, 2025	BenchmarkingManagement	—Unverified
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction	Jun 25, 2025	BenchmarkingPerson Identification	CodeCode Available
AI-Driven MRI-based Brain Tumour Segmentation Benchmarking	Jun 25, 2025	BenchmarkingImage Segmentation	—Unverified
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos	Jun 25, 2025	Artifact DetectionBenchmarking	—Unverified
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequences	Jun 25, 2025	Benchmarking	CodeCode Available
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans	Jun 25, 2025	Action DetectionBenchmarking	—Unverified
Quantitative Benchmarking of Anomaly Detection Methods in Digital Pathology	Jun 24, 2025	Anomaly DetectionArtifact Detection	—Unverified
MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control	Jun 24, 2025	Benchmarking	—Unverified
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges	Jun 24, 2025	BenchmarkingCode Generation	—Unverified
Staining normalization in histopathology: Method benchmarking using multicenter dataset	Jun 23, 2025	Benchmarking	—Unverified
Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions	Jun 23, 2025	BenchmarkingSensitivity	—Unverified
Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey	Jun 23, 2025	BenchmarkingSurvey	—Unverified
Benchmarking Music Generation Models and Metrics via Human Preference Studies	Jun 23, 2025	BenchmarkingMusic Generation	—Unverified
Survey of HPC in US Research Institutions	Jun 23, 2025	BenchmarkingGPU	—Unverified
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping	Jun 23, 2025	BenchmarkingDiversity	CodeCode Available
Statistical Multicriteria Evaluation of LLM-Generated Text	Jun 22, 2025	BenchmarkingDiversity	CodeCode Available
On the Robustness of Human-Object Interaction Detection against Distribution Shift	Jun 22, 2025	BenchmarkingData Augmentation	—Unverified

Show:10 25 50

← PrevPage 62 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified