Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1225 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1	5
Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	May 28, 2024	BenchmarkingFeature Engineering	CodeCode Available	1	5
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery	Mar 24, 2025	BenchmarkingHumanitarian	CodeCode Available	1	5
Benchmarking Object Detectors with COCO: A New Path Forward	Mar 27, 2024	BenchmarkingObject	CodeCode Available	1	5
KO codes: Inventing Nonlinear Encoding and Decoding for Reliable Wireless Communication via Deep-learning	Aug 29, 2021	BenchmarkingDecoder	CodeCode Available	1	5
KoLA: Carefully Benchmarking World Knowledge of Large Language Models	Jun 15, 2023	BenchmarkingHallucination	CodeCode Available	1	5
Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk	Jul 2, 2022	BenchmarkingMachine Translation	CodeCode Available	1	5
Can Language Models Employ the Socratic Method? Experiments with Code Debugging	Oct 4, 2023	Benchmarking	CodeCode Available	1	5
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	Jun 22, 2023	Arithmetic ReasoningBenchmarking	CodeCode Available	1	5
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning	Jun 16, 2023	Active LearningBenchmarking	CodeCode Available	1	5
Working Memory Capacity of ChatGPT: An Empirical Study	Apr 30, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Large Language Models for Automated Verilog RTL Code Generation	Dec 13, 2022	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Spatial Relationships in Text-to-Image Generation	Dec 20, 2022	BenchmarkingImage Generation	CodeCode Available	1	5
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection	Mar 12, 2025	BenchmarkingCode Classification	CodeCode Available	1	5
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless Systems	May 11, 2021	BenchmarkingEdge-computing	CodeCode Available	1	5
3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding	Oct 16, 2023	Action RecognitionBenchmarking	CodeCode Available	1	5
CBench: Towards Better Evaluation of Question Answering Over Knowledge Graphs	Apr 5, 2021	BenchmarkingKnowledge Graphs	CodeCode Available	1	5
CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital Twins	Jan 6, 2024	Autonomous VehiclesBenchmarking	CodeCode Available	1	5
Benchmarking Simulation-Based Inference	Jan 12, 2021	Benchmarking	CodeCode Available	1	5
GuacaMol: Benchmarking Models for De Novo Molecular Design	Nov 22, 2018	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmarking Language Models for Code Syntax Understanding	Oct 26, 2022	Benchmarking	CodeCode Available	1	5
Large Language Models for Multi-Robot Systems: A Survey	Feb 6, 2025	Action GenerationBenchmarking	CodeCode Available	1	5
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction	Nov 16, 2023	BenchmarkingEvent Extraction	CodeCode Available	1	5
Chaos as an interpretable benchmark for forecasting and data-driven modelling	Oct 11, 2021	BenchmarkingSymbolic Regression	CodeCode Available	1	5
Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning	Nov 8, 2021	Adversarial RobustnessBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 49 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified