SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 101150 of 473 papers

TitleStatusHype
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality TeachersCode4
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning0
Video ReCap: Recursive Captioning of Hour-Long VideosCode3
LVCHAT: Facilitating Long Video ComprehensionCode1
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark0
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal DataCode1
SnapCap: Efficient Snapshot Compressive Video Captioning0
On Scaling Up a Multilingual Vision and Language Model0
Retrieval-Augmented Egocentric Video Captioning0
A Recipe for Scaling up Text-to-Video Generation with Text-free VideosCode0
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning0
SOVC: Subject-Oriented Video Captioning0
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)0
Video Summarization: Towards Entity-Aware CaptionsCode0
RTQ: Rethinking Video-language Understanding Based on Image-text ModelCode1
VTimeLLM: Empower LLM to Grasp Video MomentsCode2
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos0
Incorporating granularity bias as the margin into contrastive loss for video captioning0
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols0
Nepali Video Captioning using CNN-RNN Architecture0
Learning Interactive Real-World Simulators0
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding TasksCode0
IcoCap: Improving Video Captioning by Compounding Images0
Human-centric Behavior Description in Videos: New Benchmark and Model0
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning0
VidChapters-7M: Video Chapters at ScaleCode2
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges0
Accurate and Fast Compressed Video CaptioningCode1
Collaborative Three-Stream Transformers for Video Captioning0
SoccerNet 2023 Challenges ResultsCode1
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual CaptioningCode1
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity ControlCode1
Prompt Switch: Efficient CLIP Adaptation for Text-Video RetrievalCode1
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion0
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data GenerationCode1
Learning Multi-modal Representations by Watching Hundreds of Surgical Video LecturesCode1
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment0
CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal ReasoningCode3
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos0
Exploring the Role of Audio in Video Captioning0
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary CaptioningCode1
Knowledge Distillation for Efficient Audio-Visual Video Captioning0
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelCode1
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and BenchmarksCode2
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetCode2
PaLI-X: On Scaling up a Multilingual Vision and Language ModelCode1
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Show:102550
← PrevPage 3 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified