SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 101150 of 473 papers

TitleStatusHype
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot CaptioningCode1
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph CaptioningCode1
Poet: Product-oriented Video Captioner for E-commerceCode1
Discriminative Latent Semantic Graph for Video CaptioningCode1
RTQ: Rethinking Video-language Understanding Based on Image-text ModelCode1
Positive-Augmented Contrastive Learning for Image and Video Captioning EvaluationCode1
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleCode1
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization TasksCode1
Large Scale Holistic Video UnderstandingCode1
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language GenerationCode1
HiCM^2: Hierarchical Compact Memory Modeling for Dense Video CaptioningCode1
Semantic Grouping Network for Video CaptioningCode1
The MSR-Video to Text Dataset with Clean AnnotationsCode1
Language Models with Image Descriptors are Strong Few-Shot Video-Language LearnersCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding MatchingCode1
Improving Generation and Evaluation of Visual Stories via Semantic ConsistencyCode1
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New BenchmarksCode1
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary GenerationCode1
Syntax-Aware Action Targeting for Video CaptioningCode1
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption GenerationCode1
A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading RulesCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Hierarchical Modular Network for Video CaptioningCode1
Accurate and Fast Compressed Video CaptioningCode1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event AnalysisCode1
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4oCode1
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and LanguageCode1
Tell me what you see: A zero-shot action recognition method based on natural language descriptionsCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Unifying Event Detection and Captioning as Sequence Generation via Pre-TrainingCode1
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph CaptioningCode1
End-to-End Video Captioning with Multitask Reinforcement LearningCode0
SoccerNet 2024 Challenges ResultsCode0
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story ContinuationCode0
End-to-End Dense Video Captioning with Masked TransformerCode0
Sketch, Ground, and Refine: Top-Down Dense Video CaptioningCode0
Streamlined Dense Video CaptioningCode0
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal AttentionCode0
Event and Entity Extraction from Generated Video CaptionsCode0
Effectively Leveraging CLIP for Generating Situational Summaries of Images and VideosCode0
Video captioning with stacked attention and semantic hard pullCode0
Edit As You Wish: Video Caption Editing with Multi-grained User ControlCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
Support-set based Multi-modal Representation Enhancement for Video CaptioningCode0
Reconstruction Network for Video CaptioningCode0
Dual-Stream Transformer for Generic Event Boundary CaptioningCode0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
Accommodating Audio Modality in CLIP for Multimodal ProcessingCode0
Pseudo-labeling with Keyword Refining for Few-Supervised Video CaptioningCode0
Show:102550
← PrevPage 3 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified