| End-to-end Temporal Action Detection with Transformer | Jun 18, 2021 | Action DetectionTemporal Action Localization | CodeCode Available | 1 |
| Learning the Predictability of the Future | Jun 19, 2021 | Representation LearningSelf-Supervised Action Recognition | CodeCode Available | 1 |
| Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads | Jun 27, 2024 | Diversityimage-classification | CodeCode Available | 1 |
| End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning | Sep 27, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 1 |
| Open-Vocabulary Video Relation Extraction | Dec 25, 2023 | Action ClassificationAction Understanding | CodeCode Available | 1 |
| End-to-End Referring Video Object Segmentation with Multimodal Transformers | Nov 29, 2021 | Inductive BiasInstance Segmentation | CodeCode Available | 1 |
| Panoramic Vision Transformer for Saliency Detection in 360° Videos | Sep 19, 2022 | Saliency DetectionSaliency Prediction | CodeCode Available | 1 |
| Learning Temporally Causal Latent Processes from General Temporal Data | Oct 11, 2021 | Causal DiscoveryRepresentation Learning | CodeCode Available | 1 |
| PAVE: Patching and Adapting Video Large Language Models | Mar 25, 2025 | Audio-visual Question AnsweringMulti-Task Learning | CodeCode Available | 1 |
| Learning Transferable Spatiotemporal Representations from Natural Script Knowledge | Sep 30, 2022 | DescriptiveRepresentation Learning | CodeCode Available | 1 |
| Learning Optical Flow with Adaptive Graph Reasoning | Feb 8, 2022 | Motion EstimationOptical Flow Estimation | CodeCode Available | 1 |
| CAMEL-Bench: A Comprehensive Arabic LMM Benchmark | Oct 24, 2024 | document understandingVideo Understanding | CodeCode Available | 1 |
| Learning Salient Boundary Feature for Anchor-free Temporal Action Localization | Mar 24, 2021 | Action LocalizationTemporal Action Localization | CodeCode Available | 1 |
| Language-Guided Audio-Visual Learning for Long-Term Sports Assessment | Jan 1, 2025 | audio-visual learningKnowledge Graphs | CodeCode Available | 1 |
| An overview on the evaluated video retrieval tasks at TRECVID 2022 | Jun 22, 2023 | Ad-hoc video searchRetrieval | CodeCode Available | 1 |
| Procedure-Aware Pretraining for Instructional Video Understanding | Mar 31, 2023 | Video Understanding | CodeCode Available | 1 |
| Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection | Dec 9, 2021 | Boundary DetectionDiversity | CodeCode Available | 1 |
| Language Repository for Long Video Understanding | Mar 21, 2024 | EgoSchemaQuestion Answering | CodeCode Available | 1 |
| Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition | Jan 1, 2021 | Action RecognitionVideo Understanding | CodeCode Available | 1 |
| IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs | Apr 21, 2025 | Video Understanding | CodeCode Available | 1 |
| A Comprehensive Study of Deep Video Action Recognition | Dec 11, 2020 | Action RecognitionDeep Learning | CodeCode Available | 1 |
| Elaborative Rehearsal for Zero-shot Action Recognition | Aug 5, 2021 | Action RecognitionFew-Shot Learning | CodeCode Available | 1 |
| Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions | May 19, 2022 | Contrastive LearningSelf-Supervised Learning | CodeCode Available | 1 |
| FrameExit: Conditional Early Exiting for Efficient Video Recognition | Apr 27, 2021 | Video RecognitionVideo Understanding | CodeCode Available | 1 |
| Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation | Oct 31, 2024 | Action SegmentationAction Understanding | CodeCode Available | 1 |