| GRiT: A Generative Region-to-text Transformer for Object Understanding | Dec 1, 2022 | DecoderDense Captioning | CodeCode Available | 2 |
| PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning | Nov 21, 2022 | 3D Classification3D Object Detection | CodeCode Available | 2 |
| What the DAAM: Interpreting Stable Diffusion Using Cross Attention | Oct 10, 2022 | DenoisingDescriptive | CodeCode Available | 2 |
| What does a platypus look like? Generating customized prompts for zero-shot image classification | Sep 7, 2022 | Descriptiveimage-classification | CodeCode Available | 2 |
| SCAMPS: Synthetics for Camera Measurement of Physiological Signals | Jun 8, 2022 | DescriptiveDiversity | CodeCode Available | 2 |
| Fine-grained Image Captioning with CLIP Reward | May 26, 2022 | Caption GenerationDescriptive | CodeCode Available | 2 |
| K-LITE: Learning Transferable Visual Models with External Knowledge | Apr 20, 2022 | BenchmarkingDescriptive | CodeCode Available | 2 |
| Language-driven Semantic Segmentation | Jan 10, 2022 | DescriptiveFew-Shot Semantic Segmentation | CodeCode Available | 2 |
| Describe Anything Model for Visual Question Answering on Text-rich Images | Jul 16, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 1 |
| Dataset Distillation via Vision-Language Category Prototype | Jun 30, 2025 | Dataset DistillationDescriptive | CodeCode Available | 1 |