SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 251300 of 473 papers

TitleStatusHype
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning0
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding MatchingCode1
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information0
Co-segmentation Inspired Attention Module for Video-based Computer Vision TasksCode1
Visual-aware Attention Dual-stream Decoder for Video Captioning0
CLIP4Caption: CLIP for Video Caption0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
OSVidCap: A Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set ScenarioCode0
Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval0
Hierarchical Multimodal Transformer to Summarize Videos0
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal AttentionCode0
X-modaler: A Versatile and High-performance Codebase for Cross-modal AnalyticsCode1
End-to-End Dense Video Captioning with Parallel DecodingCode1
Cross-Modal Graph with Meta Concepts for Video CaptioningCode0
Discriminative Latent Semantic Graph for Video CaptioningCode1
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning0
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers0
Boosting Video Captioning with Dynamic Loss Network0
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability0
Sketch, Ground, and Refine: Top-Down Dense Video CaptioningCode0
Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning0
Attention based video captioning framework for Hindi0
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding EvaluationCode1
DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy MinimizationCode1
VLM: Task-agnostic Video-Language Model Pre-training for Video UnderstandingCode0
Improving Generation and Evaluation of Visual Stories via Semantic ConsistencyCode1
Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning0
The Use of Video Captioning for Fostering Physical Activity0
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning0
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
Open-book Video Captioning with Retrieve-Copy-Generate Network0
The MSR-Video to Text Dataset with Clean AnnotationsCode1
Semantic Grouping Network for Video CaptioningCode1
Recent Advances in Video Question Answering: A Review of Datasets and Methods0
Exploration of Visual Features and their weighted-additive fusion for Video Captioning0
A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading RulesCode1
Video Captioning in Compressed Video0
Motion Guided Region Message Passing for Video Captioning0
Guidance Module Network for Video Captioning0
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish0
Understanding Action Sequences based on Video Captioning for Learning-from-Observation0
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization TasksCode1
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and LanguageCode1
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering0
ActBERT: Learning Global-Local Video-Text RepresentationsCode0
Show:102550
← PrevPage 6 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified