SOTAVerified

Benchmarking

Papers

Showing 111120 of 5548 papers

TitleStatusHype
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language ModelsCode3
Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous DrivingCode3
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray InterpretationCode3
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning AlgorithmsCode3
Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+Code3
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to AdvancesCode3
DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning BenchmarksCode3
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual ComprehensionCode3
A Survey on Performance Metrics for Object-Detection AlgorithmsCode3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Show:102550
← PrevPage 12 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified