| Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space | Jun 23, 2022 | Action Recognitionimage-classification | CodeCode Available | 1 |
| Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning | Mar 28, 2022 | Action ClassificationContrastive Learning | CodeCode Available | 1 |
| Time-Contrastive Networks: Self-Supervised Learning from Video | Apr 23, 2017 | Metric Learningreinforcement-learning | CodeCode Available | 1 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | May 11, 2025 | parameter-efficient fine-tuningVideo Alignment | —Unverified | 0 |
| DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Apr 21, 2025 | AttributeDenoising | —Unverified | 0 |
| Deep Understanding of Sign Language for Sign to Subtitle Alignment | Mar 5, 2025 | TranslationVideo Alignment | CodeCode Available | 0 |
| Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues | Jan 1, 2025 | Action RecognitionScene Recognition | CodeCode Available | 0 |
| Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance | Dec 24, 2024 | Audio GenerationVideo Alignment | —Unverified | 0 |
| Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Dec 3, 2024 | ObjectOffline RL | —Unverified | 0 |
| VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Nov 22, 2024 | Text-to-Video GenerationVideo Alignment | —Unverified | 0 |
| Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | Nov 22, 2024 | Autonomous DrivingText-to-Video Generation | CodeCode Available | 0 |
| Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content | Oct 10, 2024 | Video AlignmentVideo Generation | —Unverified | 0 |
| Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment | Sep 22, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment | Sep 6, 2024 | Action RecognitionContrastive Learning | CodeCode Available | 0 |
| Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets | Sep 2, 2024 | Video AlignmentVideo Editing | —Unverified | 0 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| A Comprehensive Review of Few-shot Action Recognition | Jul 20, 2024 | Action RecognitionFew-Shot action recognition | —Unverified | 0 |
| Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports | Jul 11, 2024 | Video Alignment | —Unverified | 0 |
| Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Jul 3, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| AniClipart: Clipart Animation with Text-to-Video Priors | Apr 18, 2024 | Image to Video GenerationText-to-Video Generation | —Unverified | 0 |
| Scaling Up Video Summarization Pretraining with Large Language Models | Apr 4, 2024 | Video AlignmentVideo Summarization | —Unverified | 0 |
| The Effects of Short Video-Sharing Services on Video Copy Detection | Mar 26, 2024 | Copy DetectionVideo Alignment | —Unverified | 0 |
| FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing | Mar 10, 2024 | Image GenerationText-to-Video Editing | —Unverified | 0 |