SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3021–3030 of 5548 papers

Title	Date	Tasks	Status	Hype
The Elusive Pursuit of Reproducing PATE-GAN: Benchmarking, Auditing, Debugging	Jun 20, 2024	Benchmarking	CodeCode Available	0
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models	Jun 19, 2024	BenchmarkingOpen-Domain Question Answering	—Unverified	0
Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN	Jun 19, 2024	BenchmarkingIntrusion Detection	CodeCode Available	0
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective	Jun 19, 2024	BenchmarkingContinual Pretraining	—Unverified	0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration	Jun 19, 2024	BenchmarkingDistractor Generation	—Unverified	0
Comparison of Open-Source and Proprietary LLMs for Machine Reading Comprehension: A Practical Analysis for Industrial Applications	Jun 19, 2024	BenchmarkingMachine Reading Comprehension	—Unverified	0
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere	Jun 19, 2024	BenchmarkingSpatio-Temporal Forecasting	CodeCode Available	0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance	Jun 18, 2024	Benchmarking	—Unverified	0
Exploring and Benchmarking the Planning Capabilities of Large Language Models	Jun 18, 2024	BenchmarkingIn-Context Learning	—Unverified	0
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts	Jun 18, 2024	ArticlesBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 303 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified