SOTAVerified

Benchmarking

Papers

Showing 20762100 of 5548 papers

TitleStatusHype
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseasesCode0
IceBench: A Benchmark for Deep Learning based Sea Ice Type ClassificationCode0
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test DataCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMsCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorchCode0
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function SetCode0
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profilesCode0
Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-LearnCode0
Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal OptimizationCode0
Hybrid Machine Learning Models of Classifying Residential Requests for Smart DispatchingCode0
Hybrid Random FeaturesCode0
HuSc3D: Human Sculpture dataset for 3D object reconstructionCode0
Hyperparameter-Free Losses for Model-Based Monocular ReconstructionCode0
Benchmarking the Fairness of Image Upsampling MethodsCode0
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature MovementsCode0
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real NewsCode0
Alchemy: A Quantum Chemistry Dataset for Benchmarking AI ModelsCode0
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language ModelsCode0
HRNET: AI on Edge for mask detection and social distancingCode0
Show:102550
← PrevPage 84 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified