| Temporally smooth online action detection using cycle-consistent future anticipation | Apr 16, 2021 | Action DetectionAutonomous Driving | CodeCode Available | 0 |
| HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding | Jul 9, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 0 |
| Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube | Apr 29, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Temporal Action Proposal Generation With Action Frequency Adaptive Network | Jun 23, 2023 | Knowledge DistillationTemporal Action Proposal Generation | CodeCode Available | 0 |
| A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot | May 16, 2023 | Emotion ClassificationQuestion Answering | CodeCode Available | 0 |
| Telling Stories for Common Sense Zero-Shot Action Recognition | Sep 29, 2023 | Action RecognitionArticles | CodeCode Available | 0 |
| Technical Report for CVPR 2022 LOVEU AQTC Challenge | Jun 29, 2022 | Video Understanding | CodeCode Available | 0 |
| Tiny Video Networks | Oct 15, 2019 | CPUGPU | CodeCode Available | 0 |
| Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning | Jun 1, 2023 | Incremental LearningKnowledge Distillation | CodeCode Available | 0 |
| Task-Aware KV Compression For Cost-Effective Long Video Understanding | Jun 26, 2025 | Video Understanding | CodeCode Available | 0 |
| TAda! Temporally-Adaptive Convolutions for Video Understanding | Oct 12, 2021 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning | Dec 7, 2021 | Contrastive LearningRepresentation Learning | CodeCode Available | 0 |
| Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach | Jun 30, 2022 | Boundary DetectionGeneric Event Boundary Detection | CodeCode Available | 0 |
| Streaming Detection of Queried Event Start | Dec 4, 2024 | Autonomous Drivingparameter-efficient fine-tuning | CodeCode Available | 0 |
| Hallucination Mitigation Prompts Long-term Video Understanding | Jun 17, 2024 | Answer GenerationHallucination | CodeCode Available | 0 |
| Gaussian Temporal Awareness Networks for Action Localization | Sep 9, 2019 | Action Localizationobject-detection | CodeCode Available | 0 |
| FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos | Dec 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Video action detection by learning graph-based spatio-temporal interactions | Dec 9, 2019 | Action DetectionAction Localization | CodeCode Available | 0 |
| FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks | Mar 24, 2022 | Action RecognitionRetrieval | CodeCode Available | 0 |
| Spatio-Temporal Perturbations for Video Attribution | Sep 1, 2021 | Video Understanding | CodeCode Available | 0 |
| FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework | Apr 9, 2021 | Language ModellingMultiple-choice | CodeCode Available | 0 |
| SoccerNet 2024 Challenges Results | Sep 16, 2024 | Action SpottingDense Video Captioning | CodeCode Available | 0 |
| Few-Shot Referring Relationships in Videos | Jan 1, 2023 | ObjectRelation Network | CodeCode Available | 0 |
| Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality | Mar 28, 2024 | Data AugmentationDiversity | CodeCode Available | 0 |
| SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding | May 22, 2025 | Action ClassificationAutomatic Speech Recognition | CodeCode Available | 0 |
| 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Mar 22, 2025 | BenchmarkingObject | CodeCode Available | 0 |
| Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation | May 6, 2024 | Action SegmentationSkeleton Based Action Segmentation | CodeCode Available | 0 |
| Features Understanding in 3D CNNs for Actions Recognition in Video | Oct 1, 2020 | Action RecognitionDecision Making | CodeCode Available | 0 |
| Situational Scene Graph for Structured Human-centric Situation Understanding | Oct 30, 2024 | Graph GenerationPredicate Classification | CodeCode Available | 0 |
| Exploring Temporal Information for Improved Video Understanding | May 25, 2019 | Action RecognitionOptical Flow Estimation | CodeCode Available | 0 |
| SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding | Apr 30, 2025 | Video Understanding | CodeCode Available | 0 |
| ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding | Oct 1, 2024 | Contrastive LearningHallucination | CodeCode Available | 0 |
| Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs | Dec 18, 2021 | Graph GenerationObject | CodeCode Available | 0 |
| Screencast Tutorial Video Understanding | Jun 1, 2020 | object-detectionObject Detection | CodeCode Available | 0 |
| Video Object Segmentation using Supervoxel-Based Gerrymandering | Apr 18, 2017 | ObjectSemantic Segmentation | CodeCode Available | 0 |
| ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding | May 29, 2025 | AvgVideo Understanding | CodeCode Available | 0 |
| Representation Flow for Action Recognition | Oct 2, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition | Mar 30, 2017 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Relation-aware Hierarchical Attention Framework for Video Question Answering | May 13, 2021 | Question AnsweringRelation | CodeCode Available | 0 |
| Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action Recognition | Nov 1, 2021 | Action RecognitionPerson Re-Identification | CodeCode Available | 0 |
| Recurrent Space-time Graph Neural Networks | Apr 11, 2019 | Action RecognitionHuman-Object Interaction Detection | CodeCode Available | 0 |
| TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos | May 26, 2025 | AttributeVideo Understanding | CodeCode Available | 0 |
| ACVUBench: Audio-Centric Video Understanding Benchmark | Mar 25, 2025 | Video Understanding | CodeCode Available | 0 |
| AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures | May 30, 2019 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Win-Fail Action Recognition | Feb 15, 2021 | Action RecognitionAction Understanding | CodeCode Available | 0 |
| VideoQA in the Era of LLMs: An Empirical Study | Aug 8, 2024 | Multimodal Large Language ModelVideo Question Answering | CodeCode Available | 0 |
| UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark | Oct 2, 2024 | Unusual Activity LocalizationVideo Understanding | CodeCode Available | 0 |
| ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment | Jun 28, 2025 | Dynamic Time WarpingLarge Language Model | CodeCode Available | 0 |
| EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization | Jun 17, 2025 | Multi-Instance RetrievalRetrieval | CodeCode Available | 0 |
| Enhancing Temporal Modeling of Video LLMs via Time Gating | Oct 8, 2024 | MVBenchQuestion Answering | CodeCode Available | 0 |