Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3201–3225 of 5548 papers

Title	Date	Tasks	Status
Benchmarking projective simulation in navigation problems	Apr 23, 2018	BenchmarkingQ-Learning	—Unverified
Benchmarking Processor Performance by Multi-Threaded Machine Learning Algorithms	Sep 11, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
JuStRank: Benchmarking LLM Judges for System Ranking	Dec 12, 2024	Benchmarking	—Unverified
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images	Dec 12, 2023	BenchmarkingRetrieval	—Unverified
Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling	Jan 6, 2022	Aerial Scene ClassificationBenchmarking	—Unverified
AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing	Jan 9, 2023	Anomaly DetectionBenchmarking	—Unverified
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models	Apr 17, 2025	BenchmarkingMath	—Unverified
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks	May 24, 2024	BenchmarkingDecoder	—Unverified
KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making	Jul 31, 2024	BenchmarkingDecision Making	—Unverified
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3	Jan 14, 2025	BenchmarkingC++ code	—Unverified
KetGPT -- Dataset Augmentation of Quantum Circuits using Transformers	Feb 20, 2024	Benchmarking	—Unverified
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy	Dec 4, 2024	AnatomyBenchmarking	—Unverified
Classification of Single-View Object Point Clouds	Dec 18, 2020	3D Object Classification6D Pose Estimation using RGB	—Unverified
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design	Apr 14, 2025	BenchmarkingLanguage Modeling	—Unverified
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition	Mar 24, 2025	BenchmarkingFood Recognition	—Unverified
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation	May 24, 2025	BenchmarkingQuestion Answering	—Unverified
Benchmarking person re-identification approaches and training datasets for practical real-world implementations	Sep 29, 2021	BenchmarkingPedestrian Detection	—Unverified
Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations	Aug 3, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified
Knowledge-aware contrastive heterogeneous molecular graph learning	Feb 17, 2025	BenchmarkingContrastive Learning	—Unverified
AEON: Adaptive Estimation of Instance-Dependent In-Distribution and Out-of-Distribution Label Noise for Robust Learning	Jan 23, 2025	Benchmarkingimage-classification	—Unverified
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?	Jun 2, 2025	BenchmarkingInstruction Following	—Unverified
Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model Benchmarking	Jan 10, 2024	BenchmarkingInformation Retrieval	—Unverified
3D Compositional Zero-shot Learning with DeCompositional Consensus	Nov 29, 2021	BenchmarkingCompositional Zero-Shot Learning	—Unverified
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems	Jul 27, 2023	BenchmarkingGPU	—Unverified
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges	Mar 6, 2025	BenchmarkingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 129 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified