| 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment | Aug 8, 2023 | 3D Question Answering (3D-QA)Dense Captioning | CodeCode Available | 2 |
| 3D-LLM: Injecting the 3D World into Large Language Models | Jul 24, 2023 | 3D Object Captioning3D Question Answering (3D-QA) | CodeCode Available | 3 |
| Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner | May 19, 2023 | Dense CaptioningImage Captioning | CodeCode Available | 1 |
| IIITD-20K: Dense captioning for Text-Image ReID | May 8, 2023 | Dense Captioning | CodeCode Available | 0 |
| CapDet: Unifying Dense Captioning and Open-World Detection Pretraining | Mar 4, 2023 | Dense Captioning | —Unverified | 0 |
| End-to-End 3D Dense Captioning with Vote2Cap-DETR | Jan 6, 2023 | 3D dense captioningDecoder | CodeCode Available | 1 |
| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training | Jan 1, 2023 | 3D dense captioning3D visual grounding | CodeCode Available | 1 |
| UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding | Dec 1, 2022 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| GRiT: A Generative Region-to-text Transformer for Object Understanding | Dec 1, 2022 | DecoderDense Captioning | CodeCode Available | 2 |
| Contextual Modeling for 3D Dense Captioning on Point Clouds | Oct 8, 2022 | 3D dense captioningDense Captioning | —Unverified | 0 |