| PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning | Mar 13, 2024 | Caption GenerationDiagnostic | —Unverified | 0 |
| Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Mar 11, 2024 | Caption Generationreinforcement-learning | —Unverified | 0 |
| MeaCap: Memory-Augmented Zero-shot Image Captioning | Mar 6, 2024 | Caption GenerationImage Captioning | CodeCode Available | 2 |
| LLMs in Political Science: Heralding a New Era of Visual Analysis | Feb 29, 2024 | Caption GenerationFace Identification | —Unverified | 0 |
| Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Jan 18, 2024 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| Social Media Ready Caption Generation for Brands | Jan 3, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Set Prediction Guided by Semantic Concepts for Diverse Video Captioning | Dec 25, 2023 | Caption GenerationDiversity | —Unverified | 0 |
| Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT | Dec 3, 2023 | Caption GenerationDecoder | CodeCode Available | 0 |
| Segment and Caption Anything | Dec 1, 2023 | Caption Generationobject-detection | CodeCode Available | 2 |
| Enhancing Image Captioning with Neural Models | Dec 1, 2023 | Caption GenerationImage Captioning | —Unverified | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism | Nov 25, 2023 | Caption GenerationDenoising | —Unverified | 0 |
| NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment | Nov 5, 2023 | Caption GenerationCommon Sense Reasoning | CodeCode Available | 1 |
| Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols | Nov 5, 2023 | Caption GenerationDense Video Captioning | —Unverified | 0 |
| Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning | Nov 2, 2023 | Caption GenerationEfficient Exploration | —Unverified | 0 |
| LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation | Oct 18, 2023 | Caption GenerationInstruction Following | —Unverified | 0 |
| VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools | Oct 16, 2023 | Caption GenerationDescriptive | —Unverified | 0 |
| ViPE: Visualise Pretty-much Everything | Oct 16, 2023 | Caption GenerationFigurative Language Visualization | CodeCode Available | 0 |
| VLIS: Unimodal Language Models Guide Multimodal Language Generation | Oct 15, 2023 | Caption GenerationExplanation Generation | CodeCode Available | 1 |
| A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation | Oct 11, 2023 | Caption GenerationDecoder | —Unverified | 0 |
| Self-supervised Cross-view Representation Reconstruction for Change Captioning | Sep 28, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images | Sep 24, 2023 | AttributeCaption Generation | —Unverified | 0 |
| Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning | Sep 20, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| RECAP: Retrieval-Augmented Audio Captioning | Sep 18, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 |