| VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion | Mar 11, 2025 | Image MattingVideo Alignment | CodeCode Available | 1 | 5 |
| Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos | Mar 22, 2023 | Representation LearningSentence | CodeCode Available | 1 | 5 |
| DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval | Jun 10, 2025 | Image CaptioningRetrieval | CodeCode Available | 1 | 5 |
| A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference | Jun 26, 2023 | Video Alignment | CodeCode Available | 0 | 5 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 | 5 |
| Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video | Oct 21, 2019 | continuous-controlContinuous Control | CodeCode Available | 0 | 5 |
| Dynamic Temporal Alignment of Speech to Lips | Aug 19, 2018 | Constrained Lip-synchronizationVideo Alignment | CodeCode Available | 0 | 5 |
| Learning from Video and Text via Large-Scale Discriminative Clustering | Jul 27, 2017 | Action RecognitionClustering | CodeCode Available | 0 | 5 |
| View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose | Oct 23, 2020 | 3D Pose EstimationAction Recognition | CodeCode Available | 0 | 5 |
| View-Invariant Probabilistic Embedding for Human Pose | Dec 2, 2019 | Action RecognitionPose Retrieval | CodeCode Available | 0 | 5 |
| Aligning Step-by-Step Instructional Diagrams to Video Demonstrations | Mar 24, 2023 | Contrastive LearningImage Retrieval | CodeCode Available | 0 | 5 |
| Deep Understanding of Sign Language for Sign to Subtitle Alignment | Mar 5, 2025 | TranslationVideo Alignment | CodeCode Available | 0 | 5 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 | 5 |
| Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues | Jan 1, 2025 | Action RecognitionScene Recognition | CodeCode Available | 0 | 5 |
| Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment | Sep 6, 2024 | Action RecognitionContrastive Learning | CodeCode Available | 0 | 5 |
| Temporal Cycle-Consistency Learning | Apr 16, 2019 | Anomaly DetectionRepresentation Learning | CodeCode Available | 0 | 5 |
| LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers | Jun 1, 2018 | Copy DetectionRetrieval | CodeCode Available | 0 | 5 |
| Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | Nov 22, 2024 | Autonomous DrivingText-to-Video Generation | CodeCode Available | 0 | 5 |
| Edit As You Wish: Video Caption Editing with Multi-grained User Control | May 15, 2023 | AttributePosition | CodeCode Available | 0 | 5 |
| VADER: Video Alignment Differencing and Retrieval | Mar 23, 2023 | MisinformationRetrieval | —Unverified | 0 | 0 |
| A Comprehensive Review of Few-shot Action Recognition | Jul 20, 2024 | Action RecognitionFew-Shot action recognition | —Unverified | 0 | 0 |
| Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Jul 3, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 | 0 |
| AniClipart: Clipart Animation with Text-to-Video Priors | Apr 18, 2024 | Image to Video GenerationText-to-Video Generation | —Unverified | 0 | 0 |
| Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment | Jul 24, 2023 | RetrievalText to Video Retrieval | —Unverified | 0 | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 | 0 |