Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2826–2850 of 5548 papers

Title	Date	Tasks	Status
Abasy Atlas v2.2: The most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization	May 5, 2020	Benchmarking	—Unverified
The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach	Apr 27, 2025	BenchmarkingDecision Making	—Unverified
BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer	Jan 11, 2021	BenchmarkingBinary Relation Extraction	—Unverified
Greening AI-enabled Systems with Software Engineering: A Research Agenda for Environmentally Sustainable AI Practices	Jun 2, 2025	Benchmarking	—Unverified
Grid Search Hyperparameter Benchmarking of BERT, ALBERT, and LongFormer on DuoRC	Jan 15, 2021	BenchmarkingLanguage Modeling	—Unverified
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function	Apr 9, 2021	BenchmarkingGeneral Classification	—Unverified
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI	Jan 9, 2025	Benchmarkingnamed-entity-recognition	—Unverified
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems	Jul 7, 2022	Benchmarking	—Unverified
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models	Sep 5, 2023	BenchmarkingZero-Shot Learning	—Unverified
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents	Jun 11, 2025	Benchmarking	—Unverified
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning	Dec 3, 2023	BenchmarkingMulti-agent Reinforcement Learning	—Unverified
gSuite: A Flexible and Framework Independent Benchmark Suite for Graph Neural Network Inference on GPUs	Oct 20, 2022	BenchmarkingComputational Efficiency	—Unverified
GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation	Jul 8, 2024	BenchmarkingGraph Embedding	—Unverified
Benchmarks as Microscopes: A Call for Model Metrology	Jul 22, 2024	Benchmarkingmodel	—Unverified
The Curious Case of Integrator Reach Sets, Part I: Basic Theory	Feb 23, 2021	Benchmarking	—Unverified
Guidelines for Fine-grained Sentence-level Arabic Readability Annotation	Oct 11, 2024	BenchmarkingSentence	—Unverified
Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks	May 21, 2025	BenchmarkingGPU	—Unverified
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge	Apr 3, 2025	AnatomyBenchmarking	—Unverified
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance	Mar 1, 2024	BenchmarkingStance Detection	—Unverified
VoiceWukong: Benchmarking Deepfake Voice Detection	Sep 10, 2024	BenchmarkingFace Swapping	—Unverified
h4rm3l: A language for Composable Jailbreak Attack Synthesis	Aug 9, 2024	BenchmarkingProgram Synthesis	—Unverified
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text	Aug 3, 2022	BenchmarkingData Augmentation	—Unverified
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure	Jan 12, 2025	BenchmarkingHyperparameter Optimization	—Unverified
AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems	May 26, 2025	BenchmarkingRecommendation Systems	—Unverified
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images	Nov 7, 2024	AnatomyBenchmarking	—Unverified

Show:10 25 50

← PrevPage 114 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified