SOTAVerified

Dense Captioning

Papers

Showing 125 of 69 papers

TitleStatusHype
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and PlanningCode3
3D-LLM: Injecting the 3D World into Large Language ModelsCode3
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
ControlCap: Controllable Region-level CaptioningCode2
TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesCode2
GRiT: A Generative Region-to-text Transformer for Object UnderstandingCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningCode2
3D-VisTA: Pre-trained Transformer for 3D Vision and Text AlignmentCode2
Grounded 3D-LLM with Referent TokensCode2
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in ActionCode1
ComiCap: A VLMs pipeline for dense captioning of Comic PanelsCode1
Integrating Visuospatial, Linguistic, and Commonsense Structure into Story VisualizationCode1
Dense-Captioning Events in VideosCode1
3D Vision and Language Pretraining with Large-Scale Synthetic DataCode1
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
End-to-End 3D Dense Captioning with Vote2Cap-DETRCode1
Integrating Visuospatial, Linguistic and Commonsense Structure into Story VisualizationCode1
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020Code1
PerLA: Perceptive 3D Language AssistantCode1
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ControlCapmAP18.2Unverified
2GRiT (ViT-B)mAP15.5Unverified
3CAG-NetmAP10.5Unverified
4FCLNmAP5.4Unverified