Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3426–3450 of 5548 papers

Title	Date	Tasks	Status
Benchmarking machine learning models for predicting aerofoil performance	Apr 22, 2025	Benchmarking	—Unverified
Benchmarking Machine Learning Models for Quantum Error Correction	Nov 18, 2023	Benchmarking	—Unverified
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models	Feb 17, 2025	Benchmarking	—Unverified
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage	Dec 20, 2024	AttributeBenchmarking	—Unverified
Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning	Dec 21, 2019	BenchmarkingPrediction	—Unverified
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking	Nov 14, 2024	BenchmarkingDrug Discovery	—Unverified
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers	Apr 19, 2025	BenchmarkingDiagnostic	—Unverified
Benchmarking machine learning models for quantum state classification	Sep 14, 2023	BenchmarkingClassification	—Unverified
Towards a Benchmark for Scientific Understanding in Humans and Machines	Apr 20, 2023	BenchmarkingInformation Retrieval	—Unverified
Benchmarking Machine Learning Methods for Distributed Acoustic Sensing	Mar 26, 2025	BenchmarkingData Augmentation	—Unverified
Benchmarking Machine Learning: How Fast Can Your Algorithms Go?	Jan 8, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym	Sep 29, 2023	Bayesian OptimizationBenchmarking	—Unverified
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors	Jun 9, 2025	BenchmarkingModel extraction	—Unverified
Low-Density 3D Point Cloud Classification	Oct 30, 2024	3D Point Cloud ClassificationAutonomous Driving	—Unverified
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication	Nov 9, 2024	BenchmarkingIntegrated sensing and communication	—Unverified
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French	Jun 1, 2022	BenchmarkingLow Resource Neural Machine Translation	—Unverified
LSTM-based Whisper Detection	Sep 20, 2018	Benchmarking	—Unverified
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives	Jun 27, 2024	Benchmarking	—Unverified
LucidDreaming: Controllable Object-Centric 3D Generation	Nov 30, 2023	3D GenerationBenchmarking	—Unverified
Benchmarking LLMs on the Semantic Overlap Summarization Task	Feb 26, 2024	BenchmarkingDocument Summarization	—Unverified
LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset	Feb 6, 2025	BenchmarkingComputed Tomography (CT)	—Unverified
Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders	Mar 7, 2025	BenchmarkingClick-Through Rate Prediction	—Unverified
Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving	May 29, 2020	Benchmarking	—Unverified
Benchmarking LLMs in Political Content Text-Annotation: Proof-of-Concept with Toxicity and Incivility Data	Sep 15, 2024	Benchmarkingtext annotation	—Unverified
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes	Oct 9, 2024	BenchmarkingMotion Generation	—Unverified

Show:10 25 50

← PrevPage 138 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified