Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2301–2325 of 5548 papers

Title	Date	Tasks	Status
BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer	Jan 11, 2021	BenchmarkingBinary Relation Extraction	—Unverified
A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection	Nov 1, 2016	BenchmarkingObject	—Unverified
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function	Apr 9, 2021	BenchmarkingGeneral Classification	—Unverified
Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data	Mar 24, 2018	BenchmarkingPrediction	—Unverified
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets	May 16, 2025	BenchmarkingKnowledge Graphs	—Unverified
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems	Jul 7, 2022	Benchmarking	—Unverified
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving	Mar 6, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents	Jun 11, 2025	Benchmarking	—Unverified
A Metadata-Driven Approach to Understand Graph Neural Networks	Oct 30, 2023	BenchmarkingGraph Learning	—Unverified
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models	Jun 3, 2025	BenchmarkingDomain Adaptation	—Unverified
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning	Dec 3, 2023	BenchmarkingMulti-agent Reinforcement Learning	—Unverified
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text	May 22, 2025	BenchmarkingRAG	—Unverified
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving	Dec 6, 2024	Autonomous DrivingBenchmarking	—Unverified
FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking	Apr 2, 2025	3D Scene ReconstructionBenchmarking	—Unverified
Benchmarks as Microscopes: A Call for Model Metrology	Jul 22, 2024	Benchmarkingmodel	—Unverified
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge	Apr 3, 2025	AnatomyBenchmarking	—Unverified
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance	Mar 7, 2025	ArticlesBenchmarking	—Unverified
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures	Jan 1, 2024	BenchmarkingInstance Segmentation	—Unverified
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods	May 2, 2024	Benchmarking	—Unverified
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency	Jan 30, 2025	BenchmarkingLanguage Modeling	—Unverified
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance	Mar 1, 2024	BenchmarkingStance Detection	—Unverified
ALT: A Python Package for Lightweight Feature Representation in Time Series Classification	Apr 17, 2025	BenchmarkingTime Series	—Unverified
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets	Oct 7, 2023	Benchmarkingnamed-entity-recognition	—Unverified
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text	Aug 3, 2022	BenchmarkingData Augmentation	—Unverified
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure	Jan 12, 2025	BenchmarkingHyperparameter Optimization	—Unverified

Show:10 25 50

← PrevPage 93 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified