SOTAVerified

Dense Captioning

Papers

Showing 150 of 69 papers

TitleStatusHype
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and PlanningCode3
3D-LLM: Injecting the 3D World into Large Language ModelsCode3
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
GRiT: A Generative Region-to-text Transformer for Object UnderstandingCode2
3D-VisTA: Pre-trained Transformer for 3D Vision and Text AlignmentCode2
TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesCode2
Grounded 3D-LLM with Referent TokensCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningCode2
ControlCap: Controllable Region-level CaptioningCode2
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
Integrating Visuospatial, Linguistic, and Commonsense Structure into Story VisualizationCode1
Dense-Captioning Events in VideosCode1
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020Code1
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
Integrating Visuospatial, Linguistic and Commonsense Structure into Story VisualizationCode1
ComiCap: A VLMs pipeline for dense captioning of Comic PanelsCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
End-to-End 3D Dense Captioning with Vote2Cap-DETRCode1
3D Vision and Language Pretraining with Large-Scale Synthetic DataCode1
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in ActionCode1
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
PerLA: Perceptive 3D Language AssistantCode1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
A Hierarchical Approach for Generating Descriptive Image ParagraphsCode0
DenseCap: Fully Convolutional Localization Networks for Dense CaptioningCode0
Dense Captioning with Joint Inference and Visual ContextCode0
Details Make a Difference: Object State-Sensitive Neurorobotic Task PlanningCode0
IIITD-20K: Dense captioning for Text-Image ReIDCode0
Joint Event Detection and Description in Continuous Video StreamsCode0
PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI EstimationCode0
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based LocalizationCode0
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions0
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection0
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs0
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving0
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation0
Improving Diversity and Reducing Redundancy in Paragraph Captions0
Dense Procedure Captioning in Narrated Instructional Videos0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations0
Contextual Modeling for 3D Dense Captioning on Point Clouds0
Context and Attribute Grounded Dense Captioning0
Complete 3d relationships extraction modality alignment network for 3d dense captioning0
3D Scene Graph Guided Vision-Language Pre-training0
CapOnImage: Context-driven Dense-Captioning on Image0
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining0
YH Technologies at ActivityNet Challenge 20180
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ControlCapmAP18.2Unverified
2GRiT (ViT-B)mAP15.5Unverified
3CAG-NetmAP10.5Unverified
4FCLNmAP5.4Unverified