Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 1A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 1The MSR-Video to Text Dataset with Clean Annotations Feb 12, 2021 Sentence Video Captioning
Code Code Available 1Semantic Grouping Network for Video Captioning Feb 1, 2021 Video Captioning
Code Code Available 1A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules Jan 8, 2021 Decoder Deep Reinforcement Learning
Code Code Available 1TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks Nov 23, 2020 Action Classification Action Localization
Code Code Available 1Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language Nov 18, 2020 Dictionary Learning Disentanglement
Code Code Available 1Multimodal Pretraining for Dense Video Captioning Nov 10, 2020 Dense Video Captioning Video Captioning
Code Code Available 1COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Nov 1, 2020 Cross-Modal Retrieval Representation Learning
Code Code Available 1Improved Actor Relation Graph based Group Activity Recognition Oct 24, 2020 Activity Recognition Group Activity Recognition
Code Code Available 1Poet: Product-oriented Video Captioner for E-commerce Aug 16, 2020 Video Captioning
Code Code Available 1SODA: Story Oriented Dense Video Captioning Evaluation Framework Aug 1, 2020 Dense Video Captioning Video Captioning
Code Code Available 1Learning to Generate Grounded Visual Captions without Localization Supervision Aug 1, 2020 Image Captioning Language Modelling
Code Code Available 1Learning to Discretely Compose Reasoning Module Networks for Video Captioning Jul 17, 2020 Decoder Question Answering
Code Code Available 1Comprehensive Information Integration Modeling Framework for Video Titling Jun 24, 2020 Descriptive Video Captioning
Code Code Available 1Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020 Jun 21, 2020 Dense Captioning Dense Video Captioning
Code Code Available 1Video Moment Localization using Object Evidence and Reverse Captioning Jun 18, 2020 Language-Based Temporal Localization Language Modelling
Code Code Available 1Syntax-Aware Action Targeting for Video Captioning Jun 1, 2020 Video Captioning
Code Code Available 1A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer May 17, 2020 Dense Video Captioning Temporal Action Proposal Generation
Code Code Available 1MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning May 11, 2020 Sentence Video Captioning
Code Code Available 1A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos May 2, 2020 Action Detection Form
Code Code Available 1HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training May 1, 2020 Language Modeling Language Modelling
Code Code Available 1Multi-modal Dense Video Captioning Mar 17, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning Mar 11, 2020 Question Answering Video Captioning
Code Code Available 1UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation Feb 15, 2020 Action Segmentation Decoder
Code Code Available 1Delving Deeper into the Decoder for Video Captioning Jan 16, 2020 Decoder Sentence
Code Code Available 1Learning to Generate Grounded Visual Captions without Localization Supervision Jun 1, 2019 Image Captioning Language Modelling
Code Code Available 1Large Scale Holistic Video Understanding Apr 25, 2019 Action Classification Action Recognition
Code Code Available 1What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment Apr 8, 2019 Action Classification Action Quality Assessment
Code Code Available 1VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Apr 6, 2019 Machine Translation Translation
Code Code Available 1Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Aug 17, 2016 Caption Generation Decoder
Code Code Available 1Video captioning with recurrent networks based on frame- and video-level features and visual content classification Dec 9, 2015 Caption Generation General Classification
Code Code Available 1Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization Jun 25, 2025 Dense Video Captioning Descriptive
— Unverified 0Dense Video Captioning using Graph-based Sentence Summarization Jun 25, 2025 Dense Video Captioning Sentence
— Unverified 0VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Jun 10, 2025 Multiple-choice Open-Ended Question Answering
— Unverified 0ARGUS: Hallucination and Omission Evaluation in Video-LLMs Jun 9, 2025 Descriptive Form
— Unverified 0Temporal Object Captioning for Street Scene Videos from LiDAR Tracks May 22, 2025 Caption Generation Video Captioning
— Unverified 0FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks May 19, 2025 Video Captioning
Code Code Available 0TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation Apr 24, 2025 Caption Generation Dense Video Captioning
— Unverified 0Describe Anything: Detailed Localized Image and Video Captioning Apr 22, 2025 Sentence Video Captioning
— Unverified 0FocusedAD: Character-centric Movie Audio Description Apr 16, 2025 Video Captioning
Code Code Available 0The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning Mar 31, 2025 Video Captioning
— Unverified 0Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding Mar 14, 2025 Denoising Dense Video Captioning
— Unverified 0Get In Video: Add Anything You Want to the Video Mar 8, 2025 object-detection Object Detection
— Unverified 0Fine-Grained Video Captioning through Scene Graph Consolidation Feb 23, 2025 Caption Generation Image Captioning
— Unverified 0LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models Feb 21, 2025 Caption Generation Video Captioning
— Unverified 0Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning Feb 19, 2025 Knowledge Distillation Object
— Unverified 0Pretrained Image-Text Models are Secretly Video Captioners Feb 19, 2025 Image Captioning Video Captioning
Code Code Available 0MAMS: Model-Agnostic Module Selection Framework for Video Captioning Jan 30, 2025 Caption Generation Video Captioning
— Unverified 0Classifier-Guided Captioning Across Modalities Jan 3, 2025 Audio captioning Video Captioning
— Unverified 0