Recipe Generation from Unsegmented Cooking Videos Sep 21, 2022 Dense Video Captioning Recipe Generation
— Unverified 0OmniVL:One Foundation Model for Image-Language and Video-Language Tasks Sep 15, 2022 Action Classification Action Recognition
— Unverified 0StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation Sep 13, 2022 Image Generation Story Continuation
Code Code Available 0An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Sep 4, 2022 Fill Mask Optical Flow Estimation
Code Code Available 1Partially Relevant Video Retrieval Aug 26, 2022 Moment Retrieval Multiple Instance Learning
Code Code Available 1Diverse Video Captioning by Adaptive Spatio-temporal Attention Aug 19, 2022 Decoder Diversity
Code Code Available 0Boosting Video-Text Retrieval with Explicit High-Level Semantics Aug 8, 2022 Retrieval Text Retrieval
— Unverified 0SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions Jul 24, 2022 Dense Captioning Dense Video Captioning
— Unverified 0Zero-Shot Video Captioning with Evolving Pseudo-Tokens Jul 22, 2022 Image Captioning Image-text matching
Code Code Available 1Unifying Event Detection and Captioning as Sequence Generation via Pre-Training Jul 18, 2022 Dense Video Captioning Event Detection
Code Code Available 1Dual-Stream Transformer for Generic Event Boundary Captioning Jul 7, 2022 Boundary Captioning Video Captioning
Code Code Available 0PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning Jul 6, 2022 Dense Video Captioning Video Captioning
— Unverified 0Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches Jun 30, 2022 Caption Generation Video Captioning
Code Code Available 1VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Jun 26, 2022 Contrastive Learning Diversity
Code Code Available 1LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Jun 14, 2022 Decoder Language Modeling
Code Code Available 1Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs Jun 9, 2022 Image Captioning Image Classification
Code Code Available 2Modality Alignment between Deep Representations for Effective Video-and-Language Learning Jun 1, 2022 Question Answering Video Captioning
— Unverified 0GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 2Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners May 22, 2022 Attribute Automatic Speech Recognition
Code Code Available 1GL-RG: Global-Local Representation Granularity for Video Captioning May 22, 2022 Caption Generation Descriptive
Code Code Available 1Support-set based Multi-modal Representation Enhancement for Video Captioning May 19, 2022 Video Captioning
Code Code Available 0Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information May 7, 2022 Text Generation Video Captioning
— Unverified 0Dual-Level Decoupled Transformer for Video Captioning May 6, 2022 Descriptive Sentence
— Unverified 0Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos Apr 28, 2022 Action Understanding Video Captioning
Code Code Available 0End-to-end Dense Video Captioning as Sequence Generation Apr 18, 2022 Dense Video Captioning Descriptive
— Unverified 0MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Apr 17, 2022 Navigate Retrieval
Code Code Available 1Semantic-Aware Pretraining for Dense Video Captioning Apr 13, 2022 Dense Captioning Dense Video Captioning
— Unverified 0Video Captioning: a comparative review of where we are and which could be the route Apr 12, 2022 Video Captioning
— Unverified 0Learning Audio-Video Modalities from Image Captions Apr 1, 2022 Image Captioning Retrieval
— Unverified 0CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation Mar 31, 2022 Retrieval Video Captioning
— Unverified 0Global2Local: A Joint-Hierarchical Attention for Video Captioning Mar 13, 2022 Video Captioning
— Unverified 0Exploiting long-term temporal dynamics for video captioning Feb 22, 2022 Video Captioning
— Unverified 0BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment Jan 25, 2022 Language Modeling Language Modelling
Code Code Available 0An Integrated Approach for Video Captioning and Applications Jan 23, 2022 Image Captioning Video Captioning
— Unverified 0Generative Adversarial Network Applications in Creating a Meta-Universe Jan 23, 2022 Generative Adversarial Network Image-to-Image Translation
— Unverified 0End-to-end Generative Pretraining for Multimodal Video Captioning Jan 20, 2022 Action Classification Decoder
— Unverified 0Discourse Analysis for Evaluating Coherence in Video Paragraph Captions Jan 17, 2022 Video Captioning Visual Dialog
— Unverified 0End-to-end Dense Video Captioning as Sequence Generation Jan 16, 2022 Dense Video Captioning Descriptive
— Unverified 0Boosting Video Representation Learning with Multi-Faceted Integration Jan 11, 2022 Action Recognition Representation Learning
— Unverified 0Variational Stacked Local Attention Networks for Diverse Video Captioning Jan 4, 2022 Decoder Diversity
— Unverified 0Tell me what you see: A zero-shot action recognition method based on natural language descriptions Dec 18, 2021 Action Recognition Descriptive
Code Code Available 1Dense Video Captioning Using Unsupervised Semantic Information Dec 15, 2021 Dense Video Captioning Video Captioning
Code Code Available 0CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising Dec 14, 2021 Cross-Modal Retrieval Decoder
— Unverified 0Syntax Customized Video Captioning by Imitating Exemplar Sentences Dec 2, 2021 Decoder Diversity
Code Code Available 0Controllable Video Captioning with an Exemplar Sentence Dec 2, 2021 Caption Generation Decoder
Code Code Available 1An Efficient Keyframes Selection Based Framework for Video Captioning Dec 1, 2021 Text Generation Video Captioning
— Unverified 0Multi-modal Dependency Tree for Video Captioning Dec 1, 2021 Caption Generation Dependency Parsing
— Unverified 0CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter Nov 30, 2021 Caption Generation Representation Learning
Code Code Available 0SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning Nov 25, 2021 Caption Generation Question Answering
Code Code Available 1Hierarchical Modular Network for Video Captioning Nov 24, 2021 Representation Learning Sentence
Code Code Available 1