UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks Jul 15, 2025 Video Captioning Video Understanding
Code Code Available 15 VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation Jun 8, 2021 Multi-Task Learning Question Answering
Code Code Available 15 IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 15 HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Oct 7, 2023 Automatic Speech Recognition Video Captioning
Code Code Available 15 PaLI-X: On Scaling up a Multilingual Vision and Language Model May 29, 2023 Chart Question Answering document understanding
Code Code Available 15 Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality Nov 1, 2022 Data Augmentation Image Retrieval
Code Code Available 15 Tell me what you see: A zero-shot action recognition method based on natural language descriptions Dec 18, 2021 Action Recognition Descriptive
Code Code Available 15 SODA: Story Oriented Dense Video Captioning Evaluation Framework Aug 1, 2020 Dense Video Captioning Video Captioning
Code Code Available 15 Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Mar 21, 2023 Contrastive Learning Image Captioning
Code Code Available 15 Poet: Product-oriented Video Captioner for E-commerce Aug 16, 2020 Video Captioning
Code Code Available 15 A Comprehensive Review of the Video-to-Text Problem Mar 27, 2021 Question Answering Retrieval
Code Code Available 15 SoccerNet 2023 Challenges Results Sep 12, 2023 Action Spotting Camera Calibration
Code Code Available 15 The MSR-Video to Text Dataset with Clean Annotations Feb 12, 2021 Sentence Video Captioning
Code Code Available 15 Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data Jan 16, 2024 Image Generation Text to Image Generation
Code Code Available 15 Learning to Generate Grounded Visual Captions without Localization Supervision Jun 1, 2019 Image Captioning Language Modelling
Code Code Available 15 EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Nov 17, 2021 Language Modelling Video Captioning
Code Code Available 15 HiCM^2: Hierarchical Compact Memory Modeling for Dense Video Captioning Dec 19, 2024 Dense Video Captioning Video Captioning
Code Code Available 15 Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches Jun 30, 2022 Caption Generation Video Captioning
Code Code Available 15 SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning Nov 25, 2021 Caption Generation Question Answering
Code Code Available 15 GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation Mar 26, 2023 Video Captioning
Code Code Available 15 Multi-modal Dense Video Captioning Mar 17, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 A Reinforcement Learning Based Encoder-Decoder Framework for Learning Stock Trading Rules Jan 8, 2021 Decoder Deep Reinforcement Learning
Code Code Available 15 From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 15 Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Dec 16, 2023 Video Captioning video narration captioning
Code Code Available 15 Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Aug 17, 2016 Caption Generation Decoder
Code Code Available 15 Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis Apr 12, 2024 Dense Video Captioning Transfer Learning
Code Code Available 15 Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Apr 1, 2021 Retrieval Text Retrieval
Code Code Available 15 Accurate and Fast Compressed Video Captioning Sep 22, 2023 Video Captioning
Code Code Available 15 G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 15 Improving Generation and Evaluation of Visual Stories via Semantic Consistency May 20, 2021 Image Generation Story Visualization
Code Code Available 15 Thinking Hallucination for Video Captioning Sep 28, 2022 Hallucination Video Captioning
Code Code Available 15 VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Jun 26, 2022 Contrastive Learning Diversity
Code Code Available 15 End-to-End Video Captioning with Multitask Reinforcement Learning Mar 21, 2018 GPU reinforcement-learning
Code Code Available 05 Sketch, Ground, and Refine: Top-Down Dense Video Captioning Jun 19, 2021 Dense Video Captioning Sentence
Code Code Available 05 End-to-End Dense Video Captioning with Masked Transformer Apr 3, 2018 Decoder Dense Video Captioning
Code Code Available 05 Event and Entity Extraction from Generated Video Captions Nov 5, 2022 Caption Generation Dense Video Captioning
Code Code Available 05 Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Sep 7, 2021 Sensor Fusion Video Captioning
Code Code Available 05 Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning Nov 28, 2022 FAD Video Captioning
Code Code Available 05 Screencast Tutorial Video Understanding Jun 1, 2020 object-detection Object Detection
Code Code Available 05 Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos Jul 30, 2024 Semantic Role Labeling Video Captioning
Code Code Available 05 A Recipe for Scaling up Text-to-Video Generation with Text-free Videos Dec 25, 2023 Image Generation Text to Image Generation
Code Code Available 05 Edit As You Wish: Video Caption Editing with Multi-grained User Control May 15, 2023 Attribute Position
Code Code Available 05 ECO: Efficient Convolutional Network for Online Video Understanding Apr 24, 2018 Action Classification Action Recognition
Code Code Available 05 Reconstruction Network for Video Captioning Mar 30, 2018 Decoder Sentence
Code Code Available 05 Video captioning with stacked attention and semantic hard pull Sep 15, 2020 Decoder Video Captioning
Code Code Available 05 Dual-Stream Transformer for Generic Event Boundary Captioning Jul 7, 2022 Boundary Captioning Video Captioning
Code Code Available 05 Accommodating Audio Modality in CLIP for Multimodal Processing Mar 12, 2023 AudioCaps Contrastive Learning
Code Code Available 05 Pretrained Image-Text Models are Secretly Video Captioners Feb 19, 2025 Image Captioning Video Captioning
Code Code Available 05 OSVidCap: A Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set Scenario Sep 29, 2021 Decoder Open Set Video Captioning
Code Code Available 05 Diverse Video Captioning by Adaptive Spatio-temporal Attention Aug 19, 2022 Decoder Diversity
Code Code Available 05