SOTAVerified

Benchmarking

Papers

Showing 441450 of 5548 papers

TitleStatusHype
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative TasksCode1
Benchmarking Deep Learning Interpretability in Time Series PredictionsCode1
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learningCode1
Clinical Prompt Learning with Frozen Language ModelsCode1
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design FrameworkCode1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksCode1
An Exploration of Embodied Visual ExplorationCode1
Benchmarking Data Science AgentsCode1
CIDEr: Consensus-based Image Description EvaluationCode1
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingCode1
Show:102550
← PrevPage 45 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified