SOTAVerified

Benchmarking

Papers

Showing 781790 of 5548 papers

TitleStatusHype
Robust Latent Matters: Boosting Image Generation with Sampling ErrorCode3
Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking0
ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness0
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models0
Benchmarking Chinese Medical LLMs: A Medbench-based Analysis of Performance Gaps and Hierarchical Optimization Strategies0
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical ReasoningCode2
Illuminating Darkness: Enhancing Real-world Low-light Scenes with Smartphone ImagesCode1
Skelite: Compact Neural Networks for Efficient Iterative SkeletonizationCode0
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and BenchmarkCode2
Show:102550
← PrevPage 79 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified