Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5126–5150 of 5548 papers

Title	Date	Tasks	Status
A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Voice	Dec 20, 2024	BenchmarkingDiagnostic	CodeCode Available
Arena-Rosnav 2.0: A Development and Benchmarking Platform for Robot Navigation in Highly Dynamic Environments	Feb 20, 2023	BenchmarkingRobot Navigation	CodeCode Available
On the Fragility of Active Learners for Text Classification	Mar 23, 2024	Active LearningBenchmarking	CodeCode Available
Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation	Oct 29, 2021	BenchmarkingBrain Tumor Segmentation	CodeCode Available
Benchmarking Large Language Models on Communicative Medical Coaching: a Novel System and Dataset	Feb 8, 2024	Benchmarking	CodeCode Available
Benchmarking Large Language Models for Math Reasoning Tasks	Aug 20, 2024	BenchmarkingIn-Context Learning	CodeCode Available
Benchmarking Large Language Models for Image Classification of Marine Mammals	Oct 22, 2024	Benchmarkingimage-classification	CodeCode Available
On the Loss of Context-awareness in General Instruction Fine-tuning	Nov 5, 2024	BenchmarkingInstruction Following	CodeCode Available
HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation	May 16, 2025	BenchmarkingEthics	CodeCode Available
SNaC: Coherence Error Detection for Narrative Summarization	May 19, 2022	BenchmarkingCoherence Evaluation	CodeCode Available
SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking Services	May 29, 2025	BenchmarkingInformation Retrieval	CodeCode Available
Using Motif Transitions for Temporal Graph Generation	Jun 19, 2023	BenchmarkingGraph Generation	CodeCode Available
Accurate Peak Detection in Multimodal Optimization via Approximated Landscape Learning	Mar 23, 2025	Benchmarking	CodeCode Available
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias	Jul 3, 2024	BenchmarkingBias Detection	CodeCode Available
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams	Jun 17, 2024	AllBenchmarking	CodeCode Available
Word Embeddings for the Construction Domain	Oct 28, 2016	BenchmarkingGeneral Classification	CodeCode Available
What Actions are Needed for Understanding Human Actions in Videos?	Aug 9, 2017	Benchmarking	CodeCode Available
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness	Jun 1, 2025	BenchmarkingManagement	CodeCode Available
On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers	Mar 16, 2022	Benchmarking	CodeCode Available
On the Use of ArXiv as a Dataset	Apr 30, 2019	ArticlesAuthor Attribution	CodeCode Available
On the use of automatically generated synthetic image datasets for benchmarking face recognition	Jun 8, 2021	BenchmarkingFace Recognition	CodeCode Available
Benchmarking Large Language Models for Molecule Prediction Tasks	Mar 8, 2024	BenchmarkingPrediction	CodeCode Available
Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS	Apr 9, 2024	BenchmarkingNeural Architecture Search	CodeCode Available
SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds	May 17, 2025	BenchmarkingBinary Classification	CodeCode Available
On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition	Jun 6, 2021	BenchmarkingMemorization	CodeCode Available

Show:10 25 50

← PrevPage 206 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified