Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1075 of 5548 papers

Title	Date	Tasks	Status	Hype
An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version	Jan 18, 2025	Benchmarking	—Unverified	0
FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring	Jan 17, 2025	BenchmarkingData Augmentation	—Unverified	0
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance	Jan 17, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	0
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU	Jan 16, 2025	Benchmarkingcontinuous-control	CodeCode Available	0
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data	Jan 16, 2025	BenchmarkingClustering	—Unverified	0
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation	Jan 16, 2025	Benchmarking	CodeCode Available	5
Off-policy Evaluation for Payments at Adyen	Jan 15, 2025	BenchmarkingDecision Making	—Unverified	0
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging	Jan 15, 2025	BenchmarkingComputational Efficiency	—Unverified	0
Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction	Jan 15, 2025	Activity PredictionBenchmarking	—Unverified	0
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents	Jan 15, 2025	BenchmarkingOptical Character Recognition (OCR)	—Unverified	0
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval	Jan 15, 2025	BenchmarkingContrastive Learning	—Unverified	0
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind	Jan 15, 2025	BenchmarkingMultiple-choice	CodeCode Available	1
Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles	Jan 15, 2025	Benchmarking	CodeCode Available	0
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Jan 15, 2025	BenchmarkingHallucination	CodeCode Available	1
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3	Jan 14, 2025	BenchmarkingC++ code	—Unverified	0
Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition	Jan 14, 2025	Activity RecognitionBenchmarking	—Unverified	0
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models	Jan 14, 2025	BenchmarkingText-to-Video Generation	CodeCode Available	4
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving	Jan 14, 2025	Autonomous DrivingBenchmarking	—Unverified	0
Benchmarking Multimodal Models for Fine-Grained Image Analysis: A Comparative Study Across Diverse Visual Features	Jan 14, 2025	Benchmarking	—Unverified	0
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning	Jan 14, 2025	BenchmarkingManagement	—Unverified	0
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings	Jan 14, 2025	BenchmarkingQuestion Answering	—Unverified	0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series Classification	Jan 14, 2025	BenchmarkingGraph Representation Learning	CodeCode Available	0
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles	Jan 13, 2025	ArticlesBenchmarking	—Unverified	0
Stronger Than You Think: Benchmarking Weak Supervision on Realistic Tasks	Jan 13, 2025	Benchmarking	CodeCode Available	0
WebWalker: Benchmarking LLMs in Web Traversal	Jan 13, 2025	BenchmarkingOpen-Domain Question Answering	CodeCode Available	11

Show:10 25 50

← PrevPage 43 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified