SOTAVerified

Dense Captioning

Papers

Showing 2650 of 69 papers

TitleStatusHype
ControlCap: Controllable Region-level CaptioningCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and PlanningCode3
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningCode2
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
3D-VisTA: Pre-trained Transformer for 3D Vision and Text AlignmentCode2
3D-LLM: Injecting the 3D World into Large Language ModelsCode3
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
IIITD-20K: Dense captioning for Text-Image ReIDCode0
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining0
End-to-End 3D Dense Captioning with Vote2Cap-DETRCode1
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding0
GRiT: A Generative Region-to-text Transformer for Object UnderstandingCode2
Contextual Modeling for 3D Dense Captioning on Point Clouds0
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions0
CapOnImage: Context-driven Dense-Captioning on Image0
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
Semantic-Aware Pretraining for Dense Video Captioning0
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs0
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
Integrating Visuospatial, Linguistic, and Commonsense Structure into Story VisualizationCode1
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ControlCapmAP18.2Unverified
2GRiT (ViT-B)mAP15.5Unverified
3CAG-NetmAP10.5Unverified
4FCLNmAP5.4Unverified