| Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval | Jul 1, 2024 | cross-modal alignmentImage Retrieval | —Unverified | 0 |
| MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval | Jun 25, 2024 | cross-modal alignmentMoment Retrieval | —Unverified | 0 |
| Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP | Jun 25, 2024 | cross-modal alignmentImage Classification | CodeCode Available | 2 |
| PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes | Jun 19, 2024 | cross-modal alignment | CodeCode Available | 1 |
| Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | Jun 12, 2024 | cross-modal alignmentLanguage Modelling | CodeCode Available | 3 |
| It is Never Too Late to Mend: Separate Learning for Multimedia Recommendation | Jun 12, 2024 | cross-modal alignmentMultimedia recommendation | CodeCode Available | 0 |
| MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction | Jun 7, 2024 | cross-modal alignmentPrediction | CodeCode Available | 1 |
| Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching | Jun 5, 2024 | cross-modal alignmentImage-text matching | —Unverified | 0 |
| Multimodal Reasoning with Multimodal Knowledge Graph | Jun 4, 2024 | cross-modal alignmentGraph Attention | —Unverified | 0 |
| Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection | Jun 2, 2024 | 3D Object Detectioncross-modal alignment | CodeCode Available | 3 |