Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis May 14, 2025 Denoising Depth Estimation
Code Code Available 75 Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Mar 27, 2024 Image Classification Image Comprehension
Code Code Available 75 MambaVision: A Hybrid Mamba-Transformer Vision Backbone Jul 10, 2024 Image Classification Instance Segmentation
Code Code Available 75 AutoTrain: No-code training for state-of-the-art models Oct 21, 2024 Classification image-classification
Code Code Available 75 MambaOut: Do We Really Need Mamba for Vision? May 13, 2024 image-classification Image Classification
Code Code Available 75 Visual-RFT: Visual Reinforcement Fine-Tuning Mar 3, 2025 Few-Shot Object Detection Fine-Grained Image Classification
Code Code Available 75 MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Oct 14, 2023 Image Classification Image Description
Code Code Available 75 DINOv2: Learning Robust Visual Features without Supervision Apr 14, 2023 Depth Estimation Domain Generalization
Code Code Available 65 Visual Instruction Tuning Apr 17, 2023 1 Image, 2*2 Stitching 3D Question Answering (3D-QA)
Code Code Available 65 Improved Baselines with Visual Instruction Tuning Oct 5, 2023 Factual Inconsistency Detection in Chart Captioning Image Classification
Code Code Available 65 Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Jul 12, 2023 Fairness Image Classification
Code Code Available 65 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness May 27, 2022 16k 4k
Code Code Available 65 A ConvNet for the 2020s Jan 10, 2022 Classification Domain Generalization
Code Code Available 55 Efficient Multimodal Learning from Data-centric Perspective Feb 18, 2024 Image Classification Referring Expression Comprehension
Code Code Available 55 Sequencer: Deep LSTM for Image Classification May 4, 2022 Domain Generalization image-classification
Code Code Available 55 Multimodal Autoregressive Pre-training of Large Vision Encoders Nov 21, 2024 Decoder Image Classification
Code Code Available 55 Scalable Pre-training of Large Autoregressive Image Models Jan 16, 2024 Image Classification
Code Code Available 55 Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Jan 5, 2024 image-classification Image Classification
Code Code Available 55 Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Nov 2, 2022 Contrastive Learning image-classification
Code Code Available 55 MedMamba: Vision Mamba for Medical Image Classification Mar 6, 2024 Classification image-classification
Code Code Available 45 Catastrophic Forgetting in Deep Learning: A Comprehensive Taxonomy Dec 16, 2023 Deep Learning image-classification
Code Code Available 45 Wavelet Convolutions for Large Receptive Fields Jul 8, 2024 2D Object Detection 2D Semantic Segmentation
Code Code Available 45 Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies Jul 1, 2024 image-classification Image Classification
Code Code Available 45 InceptionNeXt: When Inception Meets ConvNeXt Mar 29, 2023 Image Classification Semantic Segmentation
Code Code Available 45 LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day Jun 1, 2023 Image Classification Instruction Following
Code Code Available 45 InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Nov 10, 2022 2D Object Detection Classification
Code Code Available 45 Kolmogorov-Arnold Transformer Sep 16, 2024 Image Classification
Code Code Available 45 Vision GNN: An Image is Worth Graph of Nodes Jun 1, 2022 Image Classification Object Detection
Code Code Available 45 Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures Mar 4, 2024 image-classification Image Classification
Code Code Available 45 ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models Apr 19, 2022 Fairness Few-Shot Image Classification
Code Code Available 45 Visual Attention Network Feb 20, 2022 image-classification Image Classification
Code Code Available 45 Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning May 23, 2025 Decoder Image Captioning
Code Code Available 45 Benchopt: Reproducible, efficient and collaborative optimization benchmarks Jun 27, 2022 Benchmarking image-classification
Code Code Available 45 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything Dec 1, 2023 Decoder image-classification
Code Code Available 45 Detectron2 Object Detection & Manipulating Images using Cartoonization Aug 1, 2021 Autonomous Vehicles Data Visualization
Code Code Available 45 OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels Feb 27, 2025 Image Classification Instance Segmentation
Code Code Available 45 Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN May 27, 2022 Image Classification Instance Segmentation
Code Code Available 45 AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 45 Deep Residual Learning for Image Recognition Dec 10, 2015 Classification
Code Code Available 45 A Framework For Contrastive Self-Supervised Learning And Designing A New Approach Aug 31, 2020 Data Augmentation Image Classification
Code Code Available 45 Efficient Post-training Quantization with FP8 Formats Sep 26, 2023 image-classification Image Classification
Code Code Available 45 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 45 EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction May 29, 2022 Autonomous Driving CPU
Code Code Available 45 Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications Jan 11, 2024 image-classification Image Classification
Code Code Available 45 RegNet: Self-Regulated Network for Image Classification Jan 3, 2021 Classification General Classification
Code Code Available 45 MaxViT: Multi-Axis Vision Transformer Apr 4, 2022 image-classification Image Classification
Code Code Available 35 MetaFormer Baselines for Vision Oct 24, 2022 Domain Generalization Image Classification
Code Code Available 35 Ludwig: a type-based declarative deep learning toolbox Sep 17, 2019 Decoder Deep Learning
Code Code Available 35 Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey Feb 8, 2024 Articles Entity Alignment
Code Code Available 35 Cascade Prompt Learning for Vision-Language Model Adaptation Sep 26, 2024 General Knowledge image-classification
Code Code Available 35