| Visually Descriptive Language Model for Vector Graphics Reasoning | Apr 9, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 9 | 5 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 | 5 |
| AudioGen: Textually Guided Audio Generation | Sep 30, 2022 | Audio GenerationDescriptive | CodeCode Available | 6 | 5 |
| Fundamental Components of Deep Learning: A category-theoretic approach | Mar 13, 2024 | Deep LearningDescriptive | CodeCode Available | 5 | 5 |
| Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation | Apr 15, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 | 5 |
| Descriptive Image Quality Assessment in the Wild | May 29, 2024 | DescriptiveImage Quality Assessment | CodeCode Available | 3 | 5 |
| Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Dec 3, 2024 | Change DetectionDescriptive | CodeCode Available | 3 | 5 |
| Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation | Jun 2, 2025 | 4kDescriptive | CodeCode Available | 3 | 5 |
| Fine-Tuning Language Models from Human Preferences | Sep 18, 2019 | DescriptiveLanguage Modelling | CodeCode Available | 3 | 5 |
| A Survey on Self-Supervised Learning for Non-Sequential Tabular Data | Feb 2, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 | 5 |
| ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation | Sep 20, 2024 | DescriptiveQuestion Answering | CodeCode Available | 3 | 5 |
| PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning | Nov 21, 2022 | 3D Classification3D Object Detection | CodeCode Available | 2 | 5 |
| ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model | Jun 11, 2025 | cross-modal alignmentDescriptive | CodeCode Available | 2 | 5 |
| Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Mar 28, 2025 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |
| MedCalc-Bench: Evaluating Large Language Models for Medical Calculations | Jun 17, 2024 | DescriptiveMedical Diagnosis | CodeCode Available | 2 | 5 |
| K-LITE: Learning Transferable Visual Models with External Knowledge | Apr 20, 2022 | BenchmarkingDescriptive | CodeCode Available | 2 | 5 |
| GRiT: A Generative Region-to-text Transformer for Object Understanding | Dec 1, 2022 | DecoderDense Captioning | CodeCode Available | 2 | 5 |
| Language-driven Semantic Segmentation | Jan 10, 2022 | DescriptiveFew-Shot Semantic Segmentation | CodeCode Available | 2 | 5 |
| FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression | Dec 5, 2024 | DescriptiveVisual Question Answering | CodeCode Available | 2 | 5 |
| DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Jul 4, 2024 | DescriptiveDiversity | CodeCode Available | 2 | 5 |
| Fine-grained Image Captioning with CLIP Reward | May 26, 2022 | Caption GenerationDescriptive | CodeCode Available | 2 | 5 |
| FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression | Jan 1, 2025 | Descriptive | CodeCode Available | 2 | 5 |
| Customization Assistant for Text-to-image Generation | Dec 5, 2023 | DescriptiveImage Generation | CodeCode Available | 2 | 5 |
| An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control | Mar 7, 2024 | Descriptive | CodeCode Available | 2 | 5 |
| Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models | Dec 14, 2023 | DescriptiveImage Quality Assessment | CodeCode Available | 2 | 5 |