| FITA: Fine-grained Image-Text Aligner for Radiology Report Generation | May 2, 2024 | DescriptiveTriplet | —Unverified | 0 |
| CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions | May 1, 2024 | DescriptiveLanguage Modeling | —Unverified | 0 |
| Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model | Apr 30, 2024 | DescriptiveGesture Generation | —Unverified | 0 |
| Análise de ambiguidade linguística em modelos de linguagem de grande escala (LLMs) | Apr 25, 2024 | Descriptive | —Unverified | 0 |
| Aligning LLM Agents by Learning Latent Preference from User Edits | Apr 23, 2024 | DescriptiveLanguage Modelling | CodeCode Available | 1 |
| A Survey of Decomposition-Based Evolutionary Multi-Objective Optimization: Part II -- A Data Science Perspective | Apr 22, 2024 | AnatomyDescriptive | —Unverified | 0 |
| Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images | Apr 21, 2024 | Descriptive | —Unverified | 0 |
| ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis | Apr 15, 2024 | DescriptiveImage Captioning | CodeCode Available | 0 |
| Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation | Apr 15, 2024 | Contrastive LearningDescriptive | CodeCode Available | 3 |
| TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning | Apr 14, 2024 | Dense Video CaptioningDescriptive | CodeCode Available | 2 |