| Visually Descriptive Language Model for Vector Graphics Reasoning | Apr 9, 2024 | DescriptiveLanguage Modeling | CodeCode Available | 9 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 |
| AudioGen: Textually Guided Audio Generation | Sep 30, 2022 | Audio GenerationDescriptive | CodeCode Available | 6 |
| Fundamental Components of Deep Learning: A category-theoretic approach | Mar 13, 2024 | Deep LearningDescriptive | CodeCode Available | 5 |
| Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation | Jun 2, 2025 | 4kDescriptive | CodeCode Available | 3 |
| Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey | Dec 3, 2024 | Change DetectionDescriptive | CodeCode Available | 3 |
| ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation | Sep 20, 2024 | DescriptiveQuestion Answering | CodeCode Available | 3 |
| Descriptive Image Quality Assessment in the Wild | May 29, 2024 | DescriptiveImage Quality Assessment | CodeCode Available | 3 |
| Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation | Apr 15, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 |
| A Survey on Self-Supervised Learning for Non-Sequential Tabular Data | Feb 2, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 |