MambaVision: A Hybrid Mamba-Transformer Vision Backbone Jul 10, 2024 Image Classification Instance Segmentation
Code Code Available 7MambaOut: Do We Really Need Mamba for Vision? May 13, 2024 image-classification Image Classification
Code Code Available 7Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis May 14, 2025 Denoising Depth Estimation
Code Code Available 7MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Oct 14, 2023 Image Classification Image Description
Code Code Available 7Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Mar 27, 2024 Image Classification Image Comprehension
Code Code Available 7AutoTrain: No-code training for state-of-the-art models Oct 21, 2024 Classification image-classification
Code Code Available 7Visual-RFT: Visual Reinforcement Fine-Tuning Mar 3, 2025 Few-Shot Object Detection Fine-Grained Image Classification
Code Code Available 7Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Jul 12, 2023 Fairness Image Classification
Code Code Available 6DINOv2: Learning Robust Visual Features without Supervision Apr 14, 2023 Depth Estimation Domain Generalization
Code Code Available 6FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness May 27, 2022 16k 4k
Code Code Available 6Visual Instruction Tuning Apr 17, 2023 1 Image, 2*2 Stitching 3D Question Answering (3D-QA)
Code Code Available 6Improved Baselines with Visual Instruction Tuning Oct 5, 2023 Factual Inconsistency Detection in Chart Captioning Image Classification
Code Code Available 6A ConvNet for the 2020s Jan 10, 2022 Classification Domain Generalization
Code Code Available 5Efficient Multimodal Learning from Data-centric Perspective Feb 18, 2024 Image Classification Referring Expression Comprehension
Code Code Available 5Multimodal Autoregressive Pre-training of Large Vision Encoders Nov 21, 2024 Decoder Image Classification
Code Code Available 5Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Jan 5, 2024 image-classification Image Classification
Code Code Available 5Scalable Pre-training of Large Autoregressive Image Models Jan 16, 2024 Image Classification
Code Code Available 5Sequencer: Deep LSTM for Image Classification May 4, 2022 Domain Generalization image-classification
Code Code Available 5Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese Nov 2, 2022 Contrastive Learning image-classification
Code Code Available 5MedMamba: Vision Mamba for Medical Image Classification Mar 6, 2024 Classification image-classification
Code Code Available 4Catastrophic Forgetting in Deep Learning: A Comprehensive Taxonomy Dec 16, 2023 Deep Learning image-classification
Code Code Available 4Wavelet Convolutions for Large Receptive Fields Jul 8, 2024 2D Object Detection 2D Semantic Segmentation
Code Code Available 4Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies Jul 1, 2024 image-classification Image Classification
Code Code Available 4InceptionNeXt: When Inception Meets ConvNeXt Mar 29, 2023 Image Classification Semantic Segmentation
Code Code Available 4InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions Nov 10, 2022 2D Object Detection Classification
Code Code Available 4LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day Jun 1, 2023 Image Classification Instruction Following
Code Code Available 4Kolmogorov-Arnold Transformer Sep 16, 2024 Image Classification
Code Code Available 4Vision GNN: An Image is Worth Graph of Nodes Jun 1, 2022 Image Classification Object Detection
Code Code Available 4Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures Mar 4, 2024 image-classification Image Classification
Code Code Available 4ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models Apr 19, 2022 Fairness Few-Shot Image Classification
Code Code Available 4Visual Attention Network Feb 20, 2022 image-classification Image Classification
Code Code Available 4Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning May 23, 2025 Decoder Image Captioning
Code Code Available 4Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications Jan 11, 2024 image-classification Image Classification
Code Code Available 4EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything Dec 1, 2023 Decoder image-classification
Code Code Available 4AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities Nov 12, 2022 Contrastive Learning Cross-Modal Retrieval
Code Code Available 4Detectron2 Object Detection & Manipulating Images using Cartoonization Aug 1, 2021 Autonomous Vehicles Data Visualization
Code Code Available 4OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels Feb 27, 2025 Image Classification Instance Segmentation
Code Code Available 4mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Feb 1, 2023 Action Classification Image Classification
Code Code Available 4A Framework For Contrastive Self-Supervised Learning And Designing A New Approach Aug 31, 2020 Data Augmentation Image Classification
Code Code Available 4Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN May 27, 2022 Image Classification Instance Segmentation
Code Code Available 4Efficient Post-training Quantization with FP8 Formats Sep 26, 2023 image-classification Image Classification
Code Code Available 4Benchopt: Reproducible, efficient and collaborative optimization benchmarks Jun 27, 2022 Benchmarking image-classification
Code Code Available 4Deep Residual Learning for Image Recognition Dec 10, 2015 Classification
Code Code Available 4EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction May 29, 2022 Autonomous Driving CPU
Code Code Available 4RegNet: Self-Regulated Network for Image Classification Jan 3, 2021 Classification General Classification
Code Code Available 4MaxViT: Multi-Axis Vision Transformer Apr 4, 2022 image-classification Image Classification
Code Code Available 3MetaFormer Baselines for Vision Oct 24, 2022 Domain Generalization Image Classification
Code Code Available 3Ludwig: a type-based declarative deep learning toolbox Sep 17, 2019 Decoder Deep Learning
Code Code Available 3Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey Feb 8, 2024 Articles Entity Alignment
Code Code Available 3Cascade Prompt Learning for Vision-Language Model Adaptation Sep 26, 2024 General Knowledge image-classification
Code Code Available 3