SOTAVerified

Benchmarking

Papers

Showing 12111220 of 5548 papers

TitleStatusHype
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE DetectionCode1
Benchmarking Simulation-Based InferenceCode1
Benchmarking Large Language Models for Automated Verilog RTL Code GenerationCode1
Grounding Descriptions in Images informs Zero-Shot Visual RecognitionCode1
A Reinforcement Learning Environment for Multi-Service UAV-enabled Wireless SystemsCode1
CausalTime: Realistically Generated Time-series for Benchmarking of Causal DiscoveryCode1
Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark FrameworkCode1
Hierarchical graph neural nets can capture long-range interactionsCode1
Benchmarking Language Models for Code Syntax UnderstandingCode1
Show:102550
← PrevPage 122 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified