| Advanced Multimodal Deep Learning Architecture for Image-Text Matching | Jun 13, 2024 | Deep LearningImage-text matching | —Unverified | 0 |
| Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching | Jun 5, 2024 | cross-modal alignmentImage-text matching | —Unverified | 0 |
| Robust Interaction-Based Relevance Modeling for Online e-Commerce Search | Jun 4, 2024 | RetrievalSentence | CodeCode Available | 0 |
| FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs | Jun 3, 2024 | Fact CheckingFact Verification | CodeCode Available | 0 |
| Hybrid-Learning Video Moment Retrieval across Multi-Domain Labels | Jun 3, 2024 | Moment RetrievalRetrieval | —Unverified | 0 |
| LLMs and Memorization: On Quality and Specificity of Copyright Compliance | May 28, 2024 | HallucinationMemorization | CodeCode Available | 0 |
| DEMO: A Statistical Perspective for Efficient Image-Text Matching | May 19, 2024 | Image-text matchingModel Optimization | —Unverified | 0 |
| Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation | May 16, 2024 | AudioCapsEvent Detection | CodeCode Available | 1 |
| Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study | May 15, 2024 | Content-Based Image RetrievalImage Retrieval | —Unverified | 0 |
| CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | May 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning | May 11, 2024 | Image-text matchingRetrieval | —Unverified | 0 |
| COM3D: Leveraging Cross-View Correspondence and Cross-Modal Mining for 3D Retrieval | May 7, 2024 | Cross-Modal RetrievalRetrieval | —Unverified | 0 |
| Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching | Apr 29, 2024 | Cross-modal retrieval with noisy correspondenceImage-text matching | —Unverified | 0 |
| Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching | Apr 28, 2024 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| Modeling Selective Feature Attention for Representation-based Siamese Text Matching | Apr 25, 2024 | Text Matching | CodeCode Available | 0 |
| FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction | Apr 23, 2024 | HallucinationImage Generation | —Unverified | 0 |
| Narrative Action Evaluation with Prompt-Guided Multimodal Interaction | Apr 22, 2024 | Action Quality Assessmentmultimodal interaction | CodeCode Available | 1 |
| Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Apr 11, 2024 | DecoderDense Video Captioning | CodeCode Available | 2 |
| Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems | Apr 2, 2024 | Machine TranslationRetrieval | —Unverified | 0 |
| SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining | Apr 1, 2024 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models | Mar 29, 2024 | Image-text matchingObject Recognition | —Unverified | 0 |
| Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching | Mar 29, 2024 | Natural Language UnderstandingText Matching | —Unverified | 0 |
| FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues | Mar 29, 2024 | Image-text matchingLanguage Modeling | —Unverified | 0 |
| PointCloud-Text Matching: Benchmark Datasets and a Baseline | Mar 28, 2024 | Contrastive LearningRetrieval | —Unverified | 0 |
| RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training | Mar 15, 2024 | Diagnosticimage-classification | CodeCode Available | 1 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 |
| Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification | Mar 5, 2024 | Few-Shot Relation ClassificationRelation | —Unverified | 0 |
| Image-Text Matching with Multi-View Attention | Feb 27, 2024 | DiversityImage-text matching | —Unverified | 0 |
| Multi-Intent Attribute-Aware Text Matching in Searching | Feb 12, 2024 | AttributeText Matching | —Unverified | 0 |
| ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation | Feb 7, 2024 | Image GenerationImage-text matching | CodeCode Available | 1 |
| MouSi: Poly-Visual-Expert Vision-Language Models | Jan 30, 2024 | Image SegmentationImage-text matching | CodeCode Available | 2 |
| Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking | Jan 29, 2024 | Image-text matchingText Matching | CodeCode Available | 0 |
| Enhancing Image-Text Matching with Adaptive Feature Aggregation | Jan 18, 2024 | Image-text matchingImage-text Retrieval | CodeCode Available | 0 |
| GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition | Jan 18, 2024 | Action RecognitionText Matching | —Unverified | 0 |
| Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech | Jan 12, 2024 | Contrastive LearningKeyword Spotting | —Unverified | 0 |
| CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Jan 5, 2024 | Image ComprehensionImage to text | CodeCode Available | 0 |
| SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling | Jan 1, 2024 | Semantic Role LabelingText Matching | —Unverified | 0 |
| Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP | Jan 1, 2024 | Backdoor AttackContrastive Learning | CodeCode Available | 0 |
| A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking | Dec 19, 2023 | Entity LinkingText Matching | CodeCode Available | 0 |
| OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization | Dec 7, 2023 | Adversarial AttackData Augmentation | —Unverified | 0 |
| CILF-CIAE: CLIP-driven Image-Language Fusion for Correcting Inverse Age Estimation | Dec 4, 2023 | Age EstimationImage-text matching | —Unverified | 0 |
| Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models | Nov 28, 2023 | Image CaptioningImage-text matching | CodeCode Available | 1 |
| Tracing Influence at Scale: A Contrastive Learning Approach to Linking Public Comments and Regulator Responses | Nov 24, 2023 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts | Nov 16, 2023 | Binary ClassificationDescriptive | CodeCode Available | 1 |
| Active Mining Sample Pair Semantics for Image-text Matching | Nov 9, 2023 | Active LearningImage-text matching | —Unverified | 0 |
| A New Fine-grained Alignment Method for Image-text Matching | Nov 3, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Text Augmented Spatial-aware Zero-shot Referring Image Segmentation | Oct 27, 2023 | Image SegmentationReferring Expression | —Unverified | 0 |
| Cross-modal Active Complementary Learning with Self-refining Correspondence | Oct 26, 2023 | Cross-modal retrieval with noisy correspondenceImage-text matching | CodeCode Available | 1 |
| CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting | Oct 24, 2023 | Image Segmentationobject-detection | —Unverified | 0 |
| Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification | Oct 17, 2023 | Image RetrievalImage-text matching | —Unverified | 0 |