SOTAVerified

Benchmarking

Papers

Showing 20512100 of 5548 papers

TitleStatusHype
ImpliRet: Benchmarking the Implicit Fact Retrieval ChallengeCode0
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair PredictionCode0
Learning from Integral Losses in Physics Informed Neural NetworksCode0
Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power TransformersCode0
Benchmarking TPU, GPU, and CPU Platforms for Deep LearningCode0
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity LearningCode0
A Baseline Statistical Method For Robust User-Assisted Multiple SegmentationCode0
Benchmarking Top-K Keyword and Top-K Document Processing with T^2K^2 and T^2K^2D^2Code0
Benchmarking tools for a priori identifiability analysisCode0
Automatic benchmarking of large multimodal models via iterative experiment programmingCode0
Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking InsightsCode0
Benchmarking time series classification -- Functional data vs machine learning approachesCode0
A Linear Constrained Optimization Benchmark For Probabilistic Search Algorithms: The Rotated Klee-Minty ProblemCode0
A Continuous Information Gain Measure to Find the Most Discriminatory Problems for AI BenchmarkingCode0
Illuminating the Diversity-Fitness Trade-Off in Black-Box OptimizationCode0
Benchmarking the Robustness of UAV Tracking Against Common CorruptionsCode0
Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual IllusionsCode0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
IHCV: Discovery of Hidden Time-Dependent Control Variables in Non-Linear Dynamical SystemsCode0
IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)Code0
Benchmarking the Robustness of Optical Flow Estimation to CorruptionsCode0
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty QuantificationCode0
A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional NetworksCode0
Identifying Money Laundering Subgraphs on the BlockchainCode0
Identifying the Smallest Adversarial Load Perturbations that Render DC-OPF InfeasibleCode0
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseasesCode0
IceBench: A Benchmark for Deep Learning based Sea Ice Type ClassificationCode0
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test DataCode0
IdeaBench: Benchmarking Large Language Models for Research Idea GenerationCode0
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMsCode0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMsCode0
Identifying and Benchmarking Natural Out-of-Context Prediction ProblemsCode0
Impact of ImageNet Model Selection on Domain AdaptationCode0
Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorchCode0
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Benchmarking the Hooke-Jeeves Method, MTS-LS1, and BSrr on the Large-scale BBOB Function SetCode0
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profilesCode0
Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-LearnCode0
Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal OptimizationCode0
Hybrid Machine Learning Models of Classifying Residential Requests for Smart DispatchingCode0
Hybrid Random FeaturesCode0
HuSc3D: Human Sculpture dataset for 3D object reconstructionCode0
Hyperparameter-Free Losses for Model-Based Monocular ReconstructionCode0
Benchmarking the Fairness of Image Upsampling MethodsCode0
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature MovementsCode0
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real NewsCode0
Alchemy: A Quantum Chemistry Dataset for Benchmarking AI ModelsCode0
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language ModelsCode0
HRNET: AI on Edge for mask detection and social distancingCode0
Show:102550
← PrevPage 42 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified