| Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation | Apr 15, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 | 5 |
| K-LITE: Learning Transferable Visual Models with External Knowledge | Apr 20, 2022 | BenchmarkingDescriptive | CodeCode Available | 2 | 5 |
| Language-driven Semantic Segmentation | Jan 10, 2022 | DescriptiveFew-Shot Semantic Segmentation | CodeCode Available | 2 | 5 |
| GRiT: A Generative Region-to-text Transformer for Object Understanding | Dec 1, 2022 | DecoderDense Captioning | CodeCode Available | 2 | 5 |
| MedCalc-Bench: Evaluating Large Language Models for Medical Calculations | Jun 17, 2024 | DescriptiveMedical Diagnosis | CodeCode Available | 2 | 5 |
| Fine-grained Image Captioning with CLIP Reward | May 26, 2022 | Caption GenerationDescriptive | CodeCode Available | 2 | 5 |
| FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression | Dec 5, 2024 | DescriptiveVisual Question Answering | CodeCode Available | 2 | 5 |
| Customization Assistant for Text-to-image Generation | Dec 5, 2023 | DescriptiveImage Generation | CodeCode Available | 2 | 5 |
| DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification | Jul 4, 2024 | DescriptiveDiversity | CodeCode Available | 2 | 5 |
| AmadeusGPT: a natural language interface for interactive animal behavioral analysis | Jul 10, 2023 | Descriptive | CodeCode Available | 2 | 5 |