| EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct 17, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers | Jun 15, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | May 18, 2023 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos | Mar 22, 2023 | Representation LearningSentence | CodeCode Available | 1 |
| Learning a Grammar Inducer from Massive Uncurated Instructional Videos | Oct 22, 2022 | Language AcquisitionVideo Alignment | CodeCode Available | 1 |
| Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space | Jun 23, 2022 | Action Recognitionimage-classification | CodeCode Available | 1 |
| Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning | Mar 28, 2022 | Action ClassificationContrastive Learning | CodeCode Available | 1 |
| Time-Contrastive Networks: Self-Supervised Learning from Video | Apr 23, 2017 | Metric Learningreinforcement-learning | CodeCode Available | 1 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | May 11, 2025 | parameter-efficient fine-tuningVideo Alignment | —Unverified | 0 |