| Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct 17, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | May 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval | Jun 10, 2025 | Image CaptioningRetrieval | CodeCode Available | 1 |
| Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations | Sep 8, 2024 | Emotion RecognitionMamba | CodeCode Available | 1 |
| Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning | Mar 28, 2022 | Action ClassificationContrastive Learning | CodeCode Available | 1 |
| Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Jan 31, 2025 | DenoisingVideo Alignment | CodeCode Available | 1 |
| Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space | Jun 23, 2022 | Action Recognitionimage-classification | CodeCode Available | 1 |
| ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer | Jun 26, 2023 | Click-Through Rate PredictionDynamic Time Warping | —Unverified | 0 |
| AniClipart: Clipart Animation with Text-to-Video Priors | Apr 18, 2024 | Image to Video GenerationText-to-Video Generation | —Unverified | 0 |