Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Feb 29, 2024 Retrieval Text Retrieval
Code Code Available 4MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning Feb 27, 2024 class-incremental learning Class Incremental Learning
— Unverified 0Video ReCap: Recursive Captioning of Hour-Long Videos Feb 20, 2024 EgoSchema Video Captioning
Code Code Available 3LVCHAT: Facilitating Long Video Comprehension Feb 19, 2024 Video Captioning
Code Code Available 1Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark Jan 25, 2024 Decoder Video Captioning
— Unverified 0Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data Jan 16, 2024 Image Generation Text to Image Generation
Code Code Available 1SnapCap: Efficient Snapshot Compressive Video Captioning Jan 10, 2024 Compressive Sensing Video Captioning
— Unverified 0On Scaling Up a Multilingual Vision and Language Model Jan 1, 2024 document understanding In-Context Learning
— Unverified 0Retrieval-Augmented Egocentric Video Captioning Jan 1, 2024 Representation Learning Retrieval
— Unverified 0A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Dec 25, 2023 Image Generation Text to Image Generation
Code Code Available 0Set Prediction Guided by Semantic Concepts for Diverse Video Captioning Dec 25, 2023 Caption Generation Diversity
— Unverified 0SOVC: Subject-Oriented Video Captioning Dec 20, 2023 Video Captioning
— Unverified 0Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Dec 16, 2023 Video Captioning video narration captioning
Code Code Available 1Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023) Dec 12, 2023 Decoder Video Captioning
— Unverified 0Video Summarization: Towards Entity-Aware Captions Dec 1, 2023 Image Captioning Video Captioning
Code Code Available 0RTQ: Rethinking Video-language Understanding Based on Image-text Model Dec 1, 2023 Video Captioning Video Question Answering
Code Code Available 1VTimeLLM: Empower LLM to Grasp Video Moments Nov 30, 2023 Dense Video Captioning Temporal Relation Extraction
Code Code Available 2Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos Nov 28, 2023 Dense Video Captioning Transfer Learning
— Unverified 0Incorporating granularity bias as the margin into contrastive loss for video captioning Nov 25, 2023 Contrastive Learning Sentence
— Unverified 0Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols Nov 5, 2023 Caption Generation Dense Video Captioning
— Unverified 0Nepali Video Captioning using CNN-RNN Architecture Nov 5, 2023 Video Captioning
— Unverified 0Learning Interactive Real-World Simulators Oct 9, 2023 Video Captioning
— Unverified 0HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 1Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks Oct 7, 2023 Action Recognition Multiple-choice
Code Code Available 0IcoCap: Improving Video Captioning by Compounding Images Oct 5, 2023 Image Captioning Video Captioning
— Unverified 0Human-centric Behavior Description in Videos: New Benchmark and Model Oct 4, 2023 Video Captioning
— Unverified 0Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning Oct 2, 2023 Decoder Sentence
— Unverified 0VidChapters-7M: Video Chapters at Scale Sep 25, 2023 Dense Video Captioning Navigate
Code Code Available 2Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges Sep 25, 2023 Anomaly Detection Dense Video Captioning
— Unverified 0Accurate and Fast Compressed Video Captioning Sep 22, 2023 Video Captioning
Code Code Available 1Collaborative Three-Stream Transformers for Video Captioning Sep 18, 2023 Sentence Video Captioning
— Unverified 0SoccerNet 2023 Challenges Results Sep 12, 2023 Action Spotting Camera Calibration
Code Code Available 1MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning Aug 25, 2023 Image Captioning Video Captioning
Code Code Available 1VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Aug 18, 2023 Image Captioning Text Generation
Code Code Available 1Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Aug 15, 2023 Retrieval Video Captioning
Code Code Available 1Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion Aug 13, 2023 Video Captioning
— Unverified 0OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation Aug 8, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 1Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment Jul 5, 2023 Dense Video Captioning Language Modelling
— Unverified 0CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning Jun 30, 2023 Causal Inference Medical Report Generation
Code Code Available 3Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos Jun 27, 2023 Multi-Task Learning Scene Understanding
— Unverified 0Exploring the Role of Audio in Video Captioning Jun 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Jun 20, 2023 Cross-Lingual Transfer Retrieval
Code Code Available 0LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Jun 17, 2023 Boundary Captioning Language Modeling
Code Code Available 1Knowledge Distillation for Efficient Audio-Visual Video Captioning Jun 16, 2023 Audio-Visual Video Captioning Caption Generation
— Unverified 0COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks Jun 7, 2023 Cross-Modal Retrieval Language Modelling
Code Code Available 2VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset May 29, 2023 Audio captioning Audio-Visual Captioning
Code Code Available 2PaLI-X: On Scaling up a Multilingual Vision and Language Model May 29, 2023 Chart Question Answering document understanding
Code Code Available 1VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending May 22, 2023 Question Answering Retrieval
— Unverified 0