SOTAVerified

Benchmarking

Papers

Showing 30213030 of 5548 papers

TitleStatusHype
The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, DebuggingCode0
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models0
Benchmarking Unsupervised Online IDS for Masquerade Attacks in CANCode0
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications0
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and AtmosphereCode0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance0
Exploring and Benchmarking the Planning Capabilities of Large Language Models0
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts0
Show:102550
← PrevPage 303 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified