| Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval | Dec 19, 2023 | cross-modal alignmentMoment Retrieval | CodeCode Available | 1 |
| Mask Grounding for Referring Image Segmentation | Dec 19, 2023 | cross-modal alignmentImage Segmentation | CodeCode Available | 1 |
| ViLA: Efficient Video-Language Alignment for Video Question Answering | Dec 13, 2023 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Navigating Open Set Scenarios for Skeleton-based Action Recognition | Dec 11, 2023 | Action RecognitionActivity Recognition | CodeCode Available | 1 |
| Progressive Multi-Modality Learning for Inverse Protein Folding | Dec 11, 2023 | cross-modal alignmentData Augmentation | CodeCode Available | 1 |
| ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks | Oct 4, 2023 | cross-modal alignment | CodeCode Available | 1 |
| VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models | Sep 28, 2023 | Backdoor Attackcross-modal alignment | CodeCode Available | 1 |
| Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based Action Recognition | Sep 18, 2023 | Action Recognitioncross-modal alignment | CodeCode Available | 1 |
| Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models | Aug 25, 2023 | cross-modal alignmentPosition | CodeCode Available | 1 |
| Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation | Aug 24, 2023 | cross-modal alignmentDescriptive | CodeCode Available | 1 |