| Face2Text revisited: Improved data set and baseline results | May 24, 2022 | Image DescriptionTransfer Learning | —Unverified | 0 |
| Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation | May 2, 2022 | Image DescriptionMachine Translation | —Unverified | 0 |
| Multimodal fusion via cortical network inspired losses | May 1, 2022 | Emotion RecognitionImage Description | —Unverified | 0 |
| UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling | Nov 23, 2021 | Image CaptioningImage Description | CodeCode Available | 1 |
| Neural Dependency Coding inspired Multimodal Fusion | Sep 28, 2021 | Emotion RecognitionImage Description | —Unverified | 0 |
| CIDEr-R: Robust Consensus-based Image Description Evaluation | Sep 28, 2021 | DescriptiveImage Description | —Unverified | 0 |
| Cross Modification Attention Based Deliberation Model for Image Captioning | Sep 17, 2021 | DecoderDescriptive | —Unverified | 0 |
| SafeAccess+: An Intelligent System to make Smart Home Safer and Americans with Disability Act Compliant | Sep 14, 2021 | Image Description | —Unverified | 0 |
| Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP | Sep 6, 2021 | Image DescriptionOut-of-Distribution Detection | CodeCode Available | 1 |
| Revisiting Binary Local Image Description for Resource Limited Devices | Aug 18, 2021 | Image DescriptionTriplet | CodeCode Available | 1 |