SOTAVerified

Benchmarking

Papers

Showing 9761000 of 5548 papers

TitleStatusHype
Delving into Out-of-Distribution Detection with Medical Vision-Language ModelsCode1
Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural NetworksCode1
DependEval: Benchmarking LLMs for Repository Dependency UnderstandingCode1
Benchmarking Offline Reinforcement Learning on Real-Robot HardwareCode1
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite ImageryCode1
Descending through a Crowded Valley — Benchmarking Deep Learning OptimizersCode1
Multimodal Fusion via Teacher-Student Network for Indoor Action RecognitionCode1
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave ImagingCode1
Detecting beats in the photoplethysmogram: benchmarking open-source algorithmsCode1
MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution ImageryCode1
Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic EnvironmentsCode1
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMsCode1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMsCode1
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture BreedingCode1
Explainable Benchmarking for Iterative Optimization HeuristicsCode1
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in ConversationsCode1
NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse TasksCode1
NAS-Bench-Graph: Benchmarking Graph Neural Architecture SearchCode1
Benchmarking Neural Network Robustness to Common Corruptions and Surface VariationsCode1
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERTCode1
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic DiversityCode1
DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific InformationCode1
Protein Structure Tokenization: Benchmarking and New RecipeCode1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Show:102550
← PrevPage 40 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified