SOTAVerified

Multimodal Text and Image Classification

Classification with both source Image and Text

Papers

Showing 17 of 7 papers

TitleStatusHype
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained DevicesCode1
CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification0
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes ChallengeCode1
Image and Text fusion for UPMC Food-101 \ BERT and CNNsCode1
Multimodal price prediction0
Analysis of Social Media Data using Multimodal Deep Learning for Disaster ResponseCode1
Are These Birds Similar: Learning Branched Networks for Fine-grained RepresentationsCode1
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Early Fusion (Bert + InceptionV3)Accuracy (%)92.5Unverified
2Late Fusion (Bert + InceptionV3)Accuracy (%)84.59Unverified
#ModelMetricClaimedVerifiedStatus
1Convolutional image feature extraction and dense concatenatingAccuracy88Unverified
#ModelMetricClaimedVerifiedStatus
1Two Branch Network (Text - Bert + Image - Nts-Net)Accuracy96.81Unverified