Visual Instruction Tuning Apr 17, 2023 1 Image, 2*2 Stitching 3D Question Answering (3D-QA)
Code Code Available 6DINOv2: Learning Robust Visual Features without Supervision Apr 14, 2023 Depth Estimation Domain Generalization
Code Code Available 6GPT-4 Technical Report Mar 15, 2023 answerability prediction Arithmetic Reasoning
Code Code Available 6Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Nov 2, 2022 Contrastive Learning image-classification
Code Code Available 5CogVLM: Visual Expert for Pretrained Language Models Nov 6, 2023 1 Image, 2*2 Stitching FS-MEVQA
Code Code Available 5Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval Sep 14, 2024 Contrastive Learning Image Retrieval
Code Code Available 4Long-CLIP: Unlocking the Long-Text Capability of CLIP Mar 22, 2024 Image Generation Image Retrieval
Code Code Available 4BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Jan 30, 2023 Generative Visual Question Answering Image Captioning
Code Code Available 4mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 4AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 4Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed Mar 7, 2024 3D Reconstruction Image Retrieval
Code Code Available 4MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Dec 19, 2024 Image Retrieval Retrieval
Code Code Available 3MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Mar 28, 2024 Image Retrieval Implicit Relations
Code Code Available 3A Comprehensive Survey on Composed Image Retrieval Feb 19, 2025 Attribute Image Retrieval
Code Code Available 3Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Mar 8, 2024 1 Image, 2*2 Stitching Code Generation
Code Code Available 3MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data Mar 17, 2024 Image Retrieval Retrieval
Code Code Available 2Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment Jan 1, 2024 cross-modal alignment Cross-Modal Retrieval
Code Code Available 2MixVPR: Feature Mixing for Visual Place Recognition Mar 3, 2023 Autonomous Driving Image Retrieval
Code Code Available 2LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval Jul 11, 2024 Image Retrieval Image to text
Code Code Available 2INQUIRE: A Natural World Text-to-Image Retrieval Benchmark Nov 4, 2024 Image Retrieval Reranking
Code Code Available 2iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval May 5, 2024 Benchmarking Composed Image Retrieval (CoIR)
Code Code Available 2Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping Apr 9, 2024 Image Retrieval Object
Code Code Available 2Global Features are All You Need for Image Retrieval and Reranking Aug 14, 2023 All Image Retrieval
Code Code Available 2Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark May 31, 2022 Autonomous Driving Camera Pose Estimation
Code Code Available 2Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior Mar 6, 2025 Image Retrieval
Code Code Available 2Language-only Training of Zero-shot Composed Image Retrieval Jan 1, 2024 Image Retrieval Retrieval
Code Code Available 2Fine-grained Image Captioning with CLIP Reward May 26, 2022 Caption Generation Descriptive
Code Code Available 2Local Feature Matching Using Deep Learning: A Survey Jan 31, 2024 3D Reconstruction Deep Learning
Code Code Available 2FastReID: A Pytorch Toolbox for General Instance Re-identification Jun 4, 2020 Face Recognition GPU
Code Code Available 2Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval Apr 21, 2022 Cross-Modal Retrieval Image Retrieval
Code Code Available 2EMR-Merging: Tuning-Free High-Performance Model Merging May 23, 2024 Image Classification Image Retrieval
Code Code Available 2Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark Mar 12, 2025 Image Retrieval Retrieval
Code Code Available 2Grounding Language Models to Images for Multimodal Inputs and Outputs Jan 31, 2023 Image Retrieval In-Context Learning
Code Code Available 2D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval Jan 1, 2024 Image Retrieval Retrieval
Code Code Available 2All You Need to Know About Training Image Retrieval Models Mar 17, 2025 All Image Retrieval
Code Code Available 2Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network Aug 10, 2024 geo-localization Image Retrieval
Code Code Available 2EarthLoc: Astronaut Photography Localization by Indexing Earth from Space Mar 11, 2024 Data Augmentation Disaster Response
Code Code Available 2Composed Image Retrieval for Remote Sensing May 24, 2024 Composed Image Retrieval (CoIR) Descriptive
Code Code Available 2Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Oct 29, 2024 Image Retrieval RAG
Code Code Available 2Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment Apr 28, 2024 Cross-Modal Retrieval Image Retrieval
Code Code Available 2Encrypted Vector Similarity Computations Using Partially Homomorphic Encryption: Applications and Performance Analysis Mar 7, 2025 Image Retrieval Privacy Preserving
Code Code Available 2Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model Jun 13, 2024 Diagnostic Image Retrieval
Code Code Available 2Composed Multi-modal Retrieval: A Survey of Approaches and Applications Mar 3, 2025 Cross-Modal Retrieval Data Augmentation
Code Code Available 2CLEAR: A Fully User-side Image Search System Jun 17, 2022 Image Retrieval Privacy Preserving
Code Code Available 2Generating Images with Multimodal Language Models May 26, 2023 Decoder Image Generation
Code Code Available 2GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization Sep 27, 2023 Contrastive Learning geo-localization
Code Code Available 2AnyLoc: Towards Universal Visual Place Recognition Aug 1, 2023 Image Retrieval Visual Place Recognition
Code Code Available 2Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Feb 6, 2025 image-classification Image Classification
Code Code Available 2InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning May 11, 2023 1 Image, 2*2 Stitching Diversity
Code Code Available 2Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions Jun 11, 2024 Hallucination Image Description
Code Code Available 2