SOTAVerified

Benchmarking

Papers

Showing 23012325 of 5548 papers

TitleStatusHype
BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph Transformer0
A Benchmark Dataset and Saliency-guided Stacked Autoencoders for Video-based Salient Object Detection0
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function0
Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data0
Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets0
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
Bench to the Future: A Pastcasting Benchmark for Forecasting Agents0
A Metadata-Driven Approach to Understand Graph Neural Networks0
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models0
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning0
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text0
ACT-Bench: Towards Action Controllable World Models for Autonomous Driving0
FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking0
Benchmarks as Microscopes: A Call for Model Metrology0
Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge0
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance0
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures0
Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods0
Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency0
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance0
ALT: A Python Package for Lightweight Feature Representation in Time Series Classification0
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets0
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text0
Benchmarking YOLOv8 for Optimal Crack Detection in Civil Infrastructure0
Show:102550
← PrevPage 93 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified