| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Set Prediction Guided by Semantic Concepts for Diverse Video Captioning | Dec 25, 2023 | Caption GenerationDiversity | —Unverified | 0 |
| Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT | Dec 3, 2023 | Caption GenerationDecoder | CodeCode Available | 0 |
| Enhancing Image Captioning with Neural Models | Dec 1, 2023 | Caption GenerationImage Captioning | —Unverified | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism | Nov 25, 2023 | Caption GenerationDenoising | —Unverified | 0 |
| Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols | Nov 5, 2023 | Caption GenerationDense Video Captioning | —Unverified | 0 |
| Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning | Nov 2, 2023 | Caption GenerationEfficient Exploration | —Unverified | 0 |
| LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation | Oct 18, 2023 | Caption GenerationInstruction Following | —Unverified | 0 |
| VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools | Oct 16, 2023 | Caption GenerationDescriptive | —Unverified | 0 |