Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5426–5450 of 5548 papers

Title	Date	Tasks	Status
Transformers for Green Semantic Communication: Less Energy, More Semantics	Oct 11, 2023	BenchmarkingCPU	CodeCode Available
Benchmarking Data Efficiency in Δ-ML and Multifidelity Models for Quantum Chemistry	Oct 15, 2024	Benchmarking	CodeCode Available
ViP: Video Platform for PyTorch	Oct 7, 2019	BenchmarkingVideo Understanding	CodeCode Available
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language	May 15, 2025	BenchmarkingOptical Character Recognition	CodeCode Available
Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification	Feb 8, 2022	Anomaly DetectionBenchmarking	CodeCode Available
Towards Efficient Synchronous Federated Training: A Survey on System Optimization Strategies	Sep 9, 2021	BenchmarkingFederated Learning	CodeCode Available
Which Model to Trust: Assessing the Influence of Models on the Performance of Reinforcement Learning Algorithms for Continuous Control Tasks	Oct 25, 2021	Benchmarkingcontinuous-control	CodeCode Available
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation	Dec 4, 2020	BenchmarkingMachine Translation	CodeCode Available
Comparative Analysis: Violence Recognition from Videos using Transfer Learning	Aug 26, 2024	Action RecognitionBenchmarking	CodeCode Available
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available
Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis	Mar 18, 2025	BenchmarkingDrug Response Prediction	CodeCode Available
Compact Trilinear Interaction for Visual Question Answering	Sep 26, 2019	BenchmarkingKnowledge Distillation	CodeCode Available
Benchmarking Classic and Learned Navigation in Complex 3D Environments	Jan 30, 2019	Benchmarking	CodeCode Available
An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data	Dec 6, 2024	BenchmarkingImputation	CodeCode Available
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models	Jun 15, 2024	BenchmarkingData Augmentation	CodeCode Available
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models	Feb 23, 2025	BenchmarkingSpatial Reasoning	CodeCode Available
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance	Jan 17, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available
CODES: Benchmarking Coupled ODE Surrogates	Oct 28, 2024	BenchmarkingUncertainty Quantification	CodeCode Available
CodeS: Towards Code Model Generalization Under Distribution Shift	Jun 11, 2022	BenchmarkingCode Classification	CodeCode Available
Code Ownership in Open-Source AI Software Security	Dec 18, 2023	Benchmarking	CodeCode Available
Benchmarking ChatGPT on Algorithmic Reasoning	Apr 4, 2024	Benchmarking	CodeCode Available
COCO: Performance Assessment	May 11, 2016	Benchmarking	CodeCode Available
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)	Apr 5, 2024	Benchmarking	CodeCode Available
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology	Apr 24, 2023	BenchmarkingDecision Making	CodeCode Available
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs	May 21, 2025	BenchmarkingQuestion Answering	CodeCode Available

Show:10 25 50

← PrevPage 218 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified