SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1691–1700 of 5548 papers

Title	Date	Tasks	Status	Hype
LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts	Sep 5, 2024	Benchmarking	CodeCode Available	0
InfraLib: Enabling Reinforcement Learning and Decision-Making for Large-Scale Infrastructure Management	Sep 5, 2024	BenchmarkingComputational Efficiency	—Unverified	0
RTLRewriter: Methodologies for Large Models aided RTL Code Optimization	Sep 4, 2024	Benchmarking	CodeCode Available	1
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation	Sep 4, 2024	Benchmarking	—Unverified	0
NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks	Sep 4, 2024	Anomaly DetectionBenchmarking	—Unverified	0
Benchmarking Spurious Bias in Few-Shot Image Classifiers	Sep 4, 2024	AttributeBenchmarking	CodeCode Available	0
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study	Sep 3, 2024	BenchmarkingHallucination	CodeCode Available	0
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs	Sep 3, 2024	16kBenchmarking	CodeCode Available	1
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Sep 3, 2024	BenchmarkingMixed Reality	—Unverified	0
Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture	Sep 3, 2024	BenchmarkingRAG	—Unverified	0

Show:10 25 50

← PrevPage 170 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified