Visual Instruction Tuning Apr 17, 2023 1 Image, 2*2 Stitching 3D Question Answering (3D-QA)
Code Code Available 6DINOv2: Learning Robust Visual Features without Supervision Apr 14, 2023 Depth Estimation Domain Generalization
Code Code Available 6GPT-4 Technical Report Mar 15, 2023 answerability prediction Arithmetic Reasoning
Code Code Available 6CogVLM: Visual Expert for Pretrained Language Models Nov 6, 2023 1 Image, 2*2 Stitching FS-MEVQA
Code Code Available 5Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Nov 2, 2022 Contrastive Learning image-classification
Code Code Available 5Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval Sep 14, 2024 Contrastive Learning Image Retrieval
Code Code Available 4Long-CLIP: Unlocking the Long-Text Capability of CLIP Mar 22, 2024 Image Generation Image Retrieval
Code Code Available 4Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed Mar 7, 2024 3D Reconstruction Image Retrieval
Code Code Available 4mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 4BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Jan 30, 2023 Generative Visual Question Answering Image Captioning
Code Code Available 4AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 4A Comprehensive Survey on Composed Image Retrieval Feb 19, 2025 Attribute Image Retrieval
Code Code Available 3MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Dec 19, 2024 Image Retrieval Retrieval
Code Code Available 3MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Mar 28, 2024 Image Retrieval Implicit Relations
Code Code Available 3Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Mar 8, 2024 1 Image, 2*2 Stitching Code Generation
Code Code Available 3All You Need to Know About Training Image Retrieval Models Mar 17, 2025 All Image Retrieval
Code Code Available 2Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark Mar 12, 2025 Image Retrieval Retrieval
Code Code Available 2Encrypted Vector Similarity Computations Using Partially Homomorphic Encryption: Applications and Performance Analysis Mar 7, 2025 Image Retrieval Privacy Preserving
Code Code Available 2Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior Mar 6, 2025 Image Retrieval
Code Code Available 2Composed Multi-modal Retrieval: A Survey of Approaches and Applications Mar 3, 2025 Cross-Modal Retrieval Data Augmentation
Code Code Available 2Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Feb 18, 2025 Image Retrieval Question Answering
Code Code Available 2Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Feb 6, 2025 image-classification Image Classification
Code Code Available 2Vision Foundation Models for Computed Tomography Jan 15, 2025 Computed Tomography (CT) Contrastive Learning
Code Code Available 2Where am I? Cross-View Geo-localization with Natural Language Descriptions Dec 22, 2024 geo-localization Image Retrieval
Code Code Available 2Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval Dec 15, 2024 Image Retrieval Retrieval
Code Code Available 2INQUIRE: A Natural World Text-to-Image Retrieval Benchmark Nov 4, 2024 Image Retrieval Reranking
Code Code Available 2Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Oct 29, 2024 Image Retrieval RAG
Code Code Available 2Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval Oct 28, 2024 Image Retrieval Image to text
Code Code Available 2Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network Aug 10, 2024 geo-localization Image Retrieval
Code Code Available 2LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval Jul 11, 2024 Image Retrieval Image to text
Code Code Available 2Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Jun 17, 2024 Benchmarking
Code Code Available 2Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model Jun 13, 2024 Diagnostic Image Retrieval
Code Code Available 2An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval Jun 13, 2024 Contrastive Learning Image Retrieval
Code Code Available 2Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions Jun 11, 2024 Hallucination Image Description
Code Code Available 2Composed Image Retrieval for Remote Sensing May 24, 2024 Composed Image Retrieval (CoIR) Descriptive
Code Code Available 2EMR-Merging: Tuning-Free High-Performance Model Merging May 23, 2024 Image Classification Image Retrieval
Code Code Available 2iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval May 5, 2024 Benchmarking Composed Image Retrieval (CoIR)
Code Code Available 2Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 2Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping Apr 9, 2024 Image Retrieval Object
Code Code Available 2MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data Mar 17, 2024 Image Retrieval Retrieval
Code Code Available 2EarthLoc: Astronaut Photography Localization by Indexing Earth from Space Mar 11, 2024 Data Augmentation Disaster Response
Code Code Available 2Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models Mar 4, 2024 Image Retrieval Retrieval
Code Code Available 2Local Feature Matching Using Deep Learning: A Survey Jan 31, 2024 3D Reconstruction Deep Learning
Code Code Available 2Language-only Training of Zero-shot Composed Image Retrieval Jan 1, 2024 Image Retrieval Retrieval
Code Code Available 2D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval Jan 1, 2024 Image Retrieval Retrieval
Code Code Available 2Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Jan 1, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 2GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization Sep 27, 2023 Contrastive Learning geo-localization
Code Code Available 2Optimization of Rank Losses for Image Retrieval Sep 15, 2023 Image Retrieval Retrieval
Code Code Available 2NLLB-CLIP -- train performant multilingual image retrieval model on a budget Sep 4, 2023 Image Retrieval Retrieval
Code Code Available 2Global Features are All You Need for Image Retrieval and Reranking Aug 14, 2023 All Image Retrieval
Code Code Available 2