Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2326–2350 of 5548 papers

Title	Date	Tasks	Status
Leveraging LLMs to Create a Haptic Devices' Recommendation System	Jan 22, 2025	Benchmarking	—Unverified
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Jan 22, 2025	Benchmarking	CodeCode Available
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF	Jan 22, 2025	BenchmarkingHallucination	—Unverified
Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs)	Jan 21, 2025	Benchmarking	—Unverified
Benchmarking Randomized Optimization Algorithms on Binary, Permutation, and Combinatorial Problem Landscapes	Jan 21, 2025	Benchmarking	—Unverified
Optimally-Weighted Maximum Mean Discrepancy Framework for Continual Learning	Jan 21, 2025	BenchmarkingContinual Learning	—Unverified
Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems	Jan 21, 2025	Autonomous VehiclesBenchmarking	CodeCode Available
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing	Jan 20, 2025	BenchmarkingEvolutionary Algorithms	—Unverified
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model	Jan 20, 2025	Benchmarking	—Unverified
Benchmarking Large Language Models via Random Variables	Jan 20, 2025	BenchmarkingMathematical Reasoning	—Unverified
An Interpretable Measure for Quantifying Predictive Dependence between Continuous Random Variables -- Extended Version	Jan 18, 2025	Benchmarking	—Unverified
FORLAPS: An Innovative Data-Driven Reinforcement Learning Approach for Prescriptive Process Monitoring	Jan 17, 2025	BenchmarkingData Augmentation	—Unverified
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance	Jan 17, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available
Village-Net Clustering: A Rapid approach to Non-linear Unsupervised Clustering of High-Dimensional Data	Jan 16, 2025	BenchmarkingClustering	—Unverified
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU	Jan 16, 2025	Benchmarkingcontinuous-control	CodeCode Available
Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction	Jan 15, 2025	Activity PredictionBenchmarking	—Unverified
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging	Jan 15, 2025	BenchmarkingComputational Efficiency	—Unverified
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents	Jan 15, 2025	BenchmarkingOptical Character Recognition (OCR)	—Unverified
Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles	Jan 15, 2025	Benchmarking	CodeCode Available
Off-policy Evaluation for Payments at Adyen	Jan 15, 2025	BenchmarkingDecision Making	—Unverified
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval	Jan 15, 2025	BenchmarkingContrastive Learning	—Unverified
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning	Jan 14, 2025	BenchmarkingManagement	—Unverified
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3	Jan 14, 2025	BenchmarkingC++ code	—Unverified
Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition	Jan 14, 2025	Activity RecognitionBenchmarking	—Unverified
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving	Jan 14, 2025	Autonomous DrivingBenchmarking	—Unverified

Show:10 25 50

← PrevPage 94 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified