| FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction | Apr 23, 2024 | HallucinationImage Generation | —Unverified | 0 |
| Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems | Apr 2, 2024 | Machine TranslationRetrieval | —Unverified | 0 |
| SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining | Apr 1, 2024 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models | Mar 29, 2024 | Image-text matchingObject Recognition | —Unverified | 0 |
| FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues | Mar 29, 2024 | Image-text matchingLanguage Modeling | —Unverified | 0 |
| Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching | Mar 29, 2024 | Natural Language UnderstandingText Matching | —Unverified | 0 |
| PointCloud-Text Matching: Benchmark Datasets and a Baseline | Mar 28, 2024 | Contrastive LearningRetrieval | —Unverified | 0 |
| MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | Mar 5, 2024 | DiversityImage Description | CodeCode Available | 0 |
| Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification | Mar 5, 2024 | Few-Shot Relation ClassificationRelation | —Unverified | 0 |
| Image-Text Matching with Multi-View Attention | Feb 27, 2024 | DiversityImage-text matching | —Unverified | 0 |
| Multi-Intent Attribute-Aware Text Matching in Searching | Feb 12, 2024 | AttributeText Matching | —Unverified | 0 |
| Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking | Jan 29, 2024 | Image-text matchingText Matching | CodeCode Available | 0 |
| GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition | Jan 18, 2024 | Action RecognitionText Matching | —Unverified | 0 |
| Enhancing Image-Text Matching with Adaptive Feature Aggregation | Jan 18, 2024 | Image-text matchingImage-text Retrieval | CodeCode Available | 0 |
| Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech | Jan 12, 2024 | Contrastive LearningKeyword Spotting | —Unverified | 0 |
| CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Jan 5, 2024 | Image ComprehensionImage to text | CodeCode Available | 0 |
| Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP | Jan 1, 2024 | Backdoor AttackContrastive Learning | CodeCode Available | 0 |
| SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling | Jan 1, 2024 | Semantic Role LabelingText Matching | —Unverified | 0 |
| A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking | Dec 19, 2023 | Entity LinkingText Matching | CodeCode Available | 0 |
| OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization | Dec 7, 2023 | Adversarial AttackData Augmentation | —Unverified | 0 |
| CILF-CIAE: CLIP-driven Image-Language Fusion for Correcting Inverse Age Estimation | Dec 4, 2023 | Age EstimationImage-text matching | —Unverified | 0 |
| Tracing Influence at Scale: A Contrastive Learning Approach to Linking Public Comments and Regulator Responses | Nov 24, 2023 | Contrastive LearningLanguage Modeling | —Unverified | 0 |
| Active Mining Sample Pair Semantics for Image-text Matching | Nov 9, 2023 | Active LearningImage-text matching | —Unverified | 0 |
| A New Fine-grained Alignment Method for Image-text Matching | Nov 3, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Text Augmented Spatial-aware Zero-shot Referring Image Segmentation | Oct 27, 2023 | Image SegmentationReferring Expression | —Unverified | 0 |
| CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting | Oct 24, 2023 | Image Segmentationobject-detection | —Unverified | 0 |
| Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification | Oct 17, 2023 | Image RetrievalImage-text matching | —Unverified | 0 |
| Learning From Noisy Correspondence With Tri-Partition for Cross-Modal Matching | Sep 22, 2023 | Cross-modal retrieval with noisy correspondenceMemorization | —Unverified | 0 |
| Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking | Sep 15, 2023 | Image-text matchingRe-Ranking | —Unverified | 0 |
| Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks | Sep 14, 2023 | Image-text matchingSarcasm Detection | CodeCode Available | 0 |
| Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering | Sep 9, 2023 | Image CaptioningImage-text matching | CodeCode Available | 0 |
| GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models' Over-Reliance on Superficial Clue | Sep 8, 2023 | Semantic SimilaritySemantic Textual Similarity | —Unverified | 0 |
| Prompt-based Effective Input Reformulation for Legal Case Retrieval | Sep 6, 2023 | RetrievalText Matching | CodeCode Available | 0 |
| ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation | Aug 31, 2023 | Image-text matchingLanguage Modeling | —Unverified | 0 |
| Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition | Aug 24, 2023 | AttributeImage-text matching | —Unverified | 0 |
| EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE | Aug 23, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 |
| Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models | Aug 11, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| InfeRE: Step-by-Step Regex Generation via Chain of Inference | Aug 8, 2023 | Text Matching | CodeCode Available | 0 |
| Grounded Image Text Matching with Mismatched Relation Reasoning | Aug 2, 2023 | Image-text matchingRelation | —Unverified | 0 |
| Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model | Jul 1, 2023 | Text Matching | —Unverified | 0 |
| Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval | May 24, 2023 | Document RankingInformation Retrieval | CodeCode Available | 0 |
| PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification | May 24, 2023 | ClassificationContrastive Learning | —Unverified | 0 |
| MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval | May 18, 2023 | Image-text matchingRetrieval | CodeCode Available | 0 |
| Probing the Role of Positional Information in Vision-Language Models | May 17, 2023 | Contrastive LearningImage-text matching | —Unverified | 0 |
| Scene Text Recognition with Image-Text Matching-guided Dictionary | May 8, 2023 | Image-text matchingLanguage Modeling | —Unverified | 0 |
| Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information | May 2, 2023 | Bayesian InferenceImage-text matching | CodeCode Available | 0 |
| RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models | Apr 21, 2023 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 |
| Integrity and Junkiness Failure Handling for Embedding-based Retrieval: A Case Study in Social Network Search | Apr 18, 2023 | RetrievalText Matching | —Unverified | 0 |
| Verbs in Action: Improving verb understanding in video-language models | Apr 13, 2023 | Contrastive LearningQuestion Answering | CodeCode Available | 0 |