Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1126–1150 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift	Dec 15, 2022	BenchmarkingImage Captioning	CodeCode Available	1	5
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Benchmarking the Performance of Bayesian Optimization across Multiple Experimental Materials Science Domains	May 23, 2021	Active LearningBayesian Optimisation	CodeCode Available	1	5
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions	Feb 28, 2024	BenchmarkingMultiple-choice	CodeCode Available	1	5
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing	Mar 2, 2024	AttributeBenchmarking	CodeCode Available	1	5
Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform	Jul 15, 2020	ArticlesBenchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions	Jan 1, 2024	BenchmarkingInstruction Following	CodeCode Available	1	5
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents	Apr 9, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions	Mar 29, 2024	Action DetectionBenchmarking	CodeCode Available	1	5
Benchmarking Robustness of Machine Reading Comprehension Models	Apr 29, 2020	BenchmarkingMachine Reading Comprehension	CodeCode Available	1	5
Benchmarking machine learning models on multi-centre eICU critical care dataset	Oct 2, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
German's Next Language Model	Oct 21, 2020	BenchmarkingDocument Classification	CodeCode Available	1	5
GraphArena: Benchmarking Large Language Models on Graph Computational Problems	Jun 29, 2024	BenchmarkingHallucination	CodeCode Available	1	5
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns	Jan 28, 2025	Adversarial AttackBenchmarking	CodeCode Available	1	5
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State Decoding	Nov 6, 2023	BenchmarkingData Compression	CodeCode Available	1	5
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data	Feb 27, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Meaning Representations in Neural Semantic Parsing	Nov 1, 2020	BenchmarkingSemantic Parsing	CodeCode Available	1	5
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning	Sep 27, 2024	AutoMLBenchmarking	CodeCode Available	1	5
Benchmarking Meta-embeddings: What Works and What Does Not	Nov 1, 2021	BenchmarkingEmbeddings Evaluation	CodeCode Available	1	5
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios	Oct 25, 2024	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1	5
Generative Wind Power Curve Modeling Via Machine Vision: A Self-learning Deep Convolutional Network Based Method	Aug 19, 2021	BenchmarkingSynthetic Data Generation	CodeCode Available	1	5
Benchmarking Large Language Models for News Summarization	Jan 31, 2023	BenchmarkingNews Summarization	CodeCode Available	1	5

Show:10 25 50

← PrevPage 46 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified