Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1075 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation	Nov 4, 2023	BenchmarkingDrug Discovery	CodeCode Available	1	5
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios	Oct 25, 2024	BenchmarkingDiversity	CodeCode Available	1	5
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions	Sep 10, 2023	3D Human Pose Estimation3D Pose Estimation	CodeCode Available	1	5
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning	Sep 27, 2024	AutoMLBenchmarking	CodeCode Available	1	5
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph	May 23, 2025	BenchmarkingManagement	CodeCode Available	1	5
3D Common Corruptions and Data Augmentation	Mar 2, 2022	BenchmarkingData Augmentation	CodeCode Available	1	5
Continual Learning with Foundation Models: An Empirical Study of Latent Replay	Apr 30, 2022	BenchmarkingContinual Learning	CodeCode Available	1	5
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents	Apr 9, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Quantized Neural Networks on FPGAs with FINN	Feb 2, 2021	BenchmarkingQuantization	CodeCode Available	1	5
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation	Feb 10, 2025	Benchmarking	CodeCode Available	1	5
fseval: A Benchmarking Framework for Feature Selection and Feature Ranking Algorithms	Nov 23, 2022	Automated Feature EngineeringBenchmarking	CodeCode Available	1	5
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform	Jul 15, 2020	ArticlesBenchmarking	CodeCode Available	1	5
Are we really making much progress? Revisiting, benchmarking, and refining heterogeneous graph neural networks	Dec 30, 2021	BenchmarkingHeterogeneous Node Classification	CodeCode Available	1	5
From Claims to Evidence: A Unified Framework and Critical Analysis of CNN vs. Transformer vs. Mamba in Medical Image Segmentation	Mar 3, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1	5
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios	May 22, 2025	BenchmarkingInstruction Following	CodeCode Available	1	5
Benchmarking emergency department triage prediction models with machine learning and large public electronic health records	Nov 22, 2021	Benchmarking	CodeCode Available	1	5
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs	Nov 29, 2023	Benchmarking	CodeCode Available	1	5
Are Vision Language Models Ready for Clinical Diagnosis? A 3D Medical Benchmark for Tumor-centric Visual Question Answering	May 25, 2025	AnatomyBenchmarking	CodeCode Available	1	5
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis	Mar 9, 2021	BenchmarkingClassification	CodeCode Available	1	5
3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding	Mar 30, 2021	Affordance DetectionBenchmarking	CodeCode Available	1	5
Benchmarking Reinforcement Learning Techniques for Autonomous Navigation	Oct 10, 2022	Autonomous NavigationBenchmarking	CodeCode Available	1	5
Formalizing Multimedia Recommendation through Multimodal Deep Learning	Sep 11, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
FTNet: Feature Transverse Network for Thermal Image Semantic Segmentation	Oct 26, 2021	BenchmarkingScene Segmentation	CodeCode Available	1	5
Flames: Benchmarking Value Alignment of LLMs in Chinese	Nov 12, 2023	BenchmarkingFairness	CodeCode Available	1	5
FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation	May 27, 2025	BenchmarkingDecision Making	CodeCode Available	1	5

Show:10 25 50

← PrevPage 43 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified