Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 826–850 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
EduBench: A Comprehensive Benchmarking Dataset for Evaluating Large Language Models in Diverse Educational Scenarios	May 22, 2025	Benchmarking	CodeCode Available	1	5
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling	Oct 14, 2022	BenchmarkingLanguage Modeling	CodeCode Available	1	5
ByzFL: Research Framework for Robust Federated Learning	May 30, 2025	BenchmarkingFederated Learning	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach	Oct 10, 2023	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Meta-embeddings: What Works and What Does Not	Nov 1, 2021	BenchmarkingEmbeddings Evaluation	CodeCode Available	1	5
EgoNormia: Benchmarking Physical Social Norm Understanding	Feb 27, 2025	Answer GenerationBenchmarking	CodeCode Available	1	5
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5
COSMOS: Catching Out-of-Context Misinformation with Self-Supervised Learning	Jan 15, 2021	BenchmarkingMisinformation	CodeCode Available	1	5
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1	5
Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk	Jul 2, 2022	BenchmarkingMachine Translation	CodeCode Available	1	5
Benchmarking machine learning models on multi-centre eICU critical care dataset	Oct 2, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym	Dec 6, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection	Mar 12, 2025	BenchmarkingCode Classification	CodeCode Available	1	5
Improving and Benchmarking Offline Reinforcement Learning Algorithms	Jun 1, 2023	AttributeBenchmarking	CodeCode Available	1	5
IMUPoser: Full-Body Pose Estimation using IMUs in Phones, Watches, and Earbuds	Apr 25, 2023	BenchmarkingPose Estimation	CodeCode Available	1	5
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking and Survey of Explanation Methods for Black Box Models	Feb 25, 2021	BenchmarkingSurvey	CodeCode Available	1	5
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders	Jun 4, 2024	BenchmarkingClustering	CodeCode Available	1	5
ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning	Feb 8, 2022	BenchmarkingLanguage Modelling	CodeCode Available	1	5
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1	5
CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark	Jun 5, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Meaning Representations in Neural Semantic Parsing	Nov 1, 2020	BenchmarkingSemantic Parsing	CodeCode Available	1	5

Show:10 25 50

← PrevPage 34 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified