SOTAVerified

Dense Captioning

Papers

Showing 150 of 69 papers

TitleStatusHype
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and PlanningCode3
3D-LLM: Injecting the 3D World into Large Language ModelsCode3
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
ControlCap: Controllable Region-level CaptioningCode2
GRiT: A Generative Region-to-text Transformer for Object UnderstandingCode2
Grounded 3D-LLM with Referent TokensCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningCode2
TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesCode2
3D-VisTA: Pre-trained Transformer for 3D Vision and Text AlignmentCode2
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
Dense-Captioning Events in VideosCode1
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in ActionCode1
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020Code1
Integrating Visuospatial, Linguistic and Commonsense Structure into Story VisualizationCode1
Integrating Visuospatial, Linguistic, and Commonsense Structure into Story VisualizationCode1
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
ComiCap: A VLMs pipeline for dense captioning of Comic PanelsCode1
End-to-End 3D Dense Captioning with Vote2Cap-DETRCode1
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
3D Vision and Language Pretraining with Large-Scale Synthetic DataCode1
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
PerLA: Perceptive 3D Language AssistantCode1
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds0
3D Scene Graph Guided Vision-Language Pre-training0
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation0
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes0
Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos0
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos0
Bi-directional Contextual Attention for 3D Dense Captioning0
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining0
CapOnImage: Context-driven Dense-Captioning on Image0
Complete 3d relationships extraction modality alignment network for 3d dense captioning0
Context and Attribute Grounded Dense Captioning0
Contextual Modeling for 3D Dense Captioning on Point Clouds0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
Dense Procedure Captioning in Narrated Instructional Videos0
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs0
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection0
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition0
FlexCap: Describe Anything in Images in Controllable Detail0
Fooling Vision and Language Models Despite Localization and Attention Mechanism0
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions0
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving0
Improving Diversity and Reducing Redundancy in Paragraph Captions0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ControlCapmAP18.2Unverified
2GRiT (ViT-B)mAP15.5Unverified
3CAG-NetmAP10.5Unverified
4FCLNmAP5.4Unverified