Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1551–1575 of 5548 papers

Title	Date	Tasks	Status
Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares	Jun 22, 2025	Benchmarkingregression	—Unverified
Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking	Jun 21, 2025	BenchmarkingReinforcement Learning (RL)	—Unverified
Universal Music Representations? Evaluating Foundation Models on World Music Corpora	Jun 20, 2025	BenchmarkingFew-Shot Learning	CodeCode Available
A Comparative Analysis of Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as Dimensionality Reduction Techniques	Jun 20, 2025	BenchmarkingDimensionality Reduction	—Unverified
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents	Jun 19, 2025	Benchmarking	—Unverified
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors	Jun 19, 2025	BenchmarkingFace Swapping	—Unverified
Finance Language Model Evaluation (FLaME)	Jun 18, 2025	BenchmarkingLanguage Model Evaluation	—Unverified
Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery	Jun 17, 2025	BenchmarkingDrug Discovery	—Unverified
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge	Jun 17, 2025	BenchmarkingRetrieval	CodeCode Available
Egocentric Human-Object Interaction Detection: A New Benchmark and Method	Jun 17, 2025	BenchmarkingHuman-Object Interaction Detection	—Unverified
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning	Jun 17, 2025	BenchmarkingSelf-Supervised Learning	—Unverified
PGLib-CO2: A Power Grid Library for Computing and Optimizing Carbon Emissions	Jun 17, 2025	Benchmarking	—Unverified
Deep Diffusion Models and Unsupervised Hyperspectral Unmixing for Realistic Abundance Map Synthesis	Jun 16, 2025	BenchmarkingData Augmentation	—Unverified
Robustness of Reinforcement Learning-Based Traffic Signal Control under Incidents: A Comparative Study	Jun 16, 2025	BenchmarkingTraffic Signal Control	—Unverified
JENGA: Object selection and pose estimation for robotic grasping from a stack	Jun 16, 2025	BenchmarkingObject	—Unverified
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects	Jun 16, 2025	BenchmarkingInstance Segmentation	—Unverified
Few-Shot Learning for Industrial Time Series: A Comparative Analysis Using the Example of Screw-Fastening Process Monitoring	Jun 16, 2025	BenchmarkingFew-Shot Learning	—Unverified
C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation	Jun 16, 2025	BenchmarkingRecommendation Systems	CodeCode Available
MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios	Jun 15, 2025	Benchmarking	CodeCode Available
A large-scale, physically-based synthetic dataset for satellite pose estimation	Jun 15, 2025	BenchmarkingDataset Generation	—Unverified
Learning Best Paths in Quantum Networks	Jun 14, 2025	Benchmarking	—Unverified
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark	Jun 14, 2025	BenchmarkingGraph Learning	CodeCode Available
Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation	Jun 13, 2025	Anomaly DetectionBenchmarking	—Unverified
SemanticST: Spatially Informed Semantic Graph Learning for Clustering, Integration, and Scalable Analysis of Spatial Transcriptomics	Jun 13, 2025	BenchmarkingContrastive Learning	—Unverified
EconGym: A Scalable AI Testbed with Diverse Economic Tasks	Jun 13, 2025	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 63 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified