| Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning | Jun 1, 2023 | Incremental LearningKnowledge Distillation | CodeCode Available | 0 | 5 |
| Task-Aware KV Compression For Cost-Effective Long Video Understanding | Jun 26, 2025 | Video Understanding | CodeCode Available | 0 | 5 |
| TAda! Temporally-Adaptive Convolutions for Video Understanding | Oct 12, 2021 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding | Jul 9, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 0 | 5 |
| Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning | Dec 7, 2021 | Contrastive LearningRepresentation Learning | CodeCode Available | 0 | 5 |
| Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach | Jun 30, 2022 | Boundary DetectionGeneric Event Boundary Detection | CodeCode Available | 0 | 5 |
| Hallucination Mitigation Prompts Long-term Video Understanding | Jun 17, 2024 | Answer GenerationHallucination | CodeCode Available | 0 | 5 |
| 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Mar 22, 2025 | BenchmarkingObject | CodeCode Available | 0 | 5 |
| Video action detection by learning graph-based spatio-temporal interactions | Dec 9, 2019 | Action DetectionAction Localization | CodeCode Available | 0 | 5 |
| Streaming Detection of Queried Event Start | Dec 4, 2024 | Autonomous Drivingparameter-efficient fine-tuning | CodeCode Available | 0 | 5 |
| SoccerNet 2024 Challenges Results | Sep 16, 2024 | Action SpottingDense Video Captioning | CodeCode Available | 0 | 5 |
| SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding | May 22, 2025 | Action ClassificationAutomatic Speech Recognition | CodeCode Available | 0 | 5 |
| Creative Flow+ Dataset | Jun 1, 2019 | 3D Character Animation From A Single PhotoDepth Estimation | CodeCode Available | 0 | 5 |
| Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation | May 6, 2024 | Action SegmentationSkeleton Based Action Segmentation | CodeCode Available | 0 | 5 |
| Situational Scene Graph for Structured Human-centric Situation Understanding | Oct 30, 2024 | Graph GenerationPredicate Classification | CodeCode Available | 0 | 5 |
| SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding | Apr 30, 2025 | Video Understanding | CodeCode Available | 0 | 5 |
| Gaussian Temporal Awareness Networks for Action Localization | Sep 9, 2019 | Action Localizationobject-detection | CodeCode Available | 0 | 5 |
| DramaQA: Character-Centered Video Story Understanding with Hierarchical QA | May 7, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 | 5 |
| Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing | Mar 13, 2025 | EgoSchemaForm | CodeCode Available | 0 | 5 |
| DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture | Apr 18, 2020 | Anomaly DetectionClassification | CodeCode Available | 0 | 5 |
| ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding | May 29, 2025 | AvgVideo Understanding | CodeCode Available | 0 | 5 |
| Screencast Tutorial Video Understanding | Jun 1, 2020 | object-detectionObject Detection | CodeCode Available | 0 | 5 |
| Representation Flow for Action Recognition | Oct 2, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| Contextual Explainable Video Representation: Human Perception-based Understanding | Dec 12, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding | Oct 1, 2024 | Contrastive LearningHallucination | CodeCode Available | 0 | 5 |
| Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action Recognition | Nov 1, 2021 | Action RecognitionPerson Re-Identification | CodeCode Available | 0 | 5 |
| Recurrent Space-time Graph Neural Networks | Apr 11, 2019 | Action RecognitionHuman-Object Interaction Detection | CodeCode Available | 0 | 5 |
| FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos | Dec 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 | 5 |
| Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge | Aug 21, 2018 | Video Understanding | CodeCode Available | 0 | 5 |
| VideoDG: Generalizing Temporal Relations in Videos to Novel Domains | Dec 8, 2019 | Action RecognitionData Augmentation | CodeCode Available | 0 | 5 |
| Relation-aware Hierarchical Attention Framework for Video Question Answering | May 13, 2021 | Question AnsweringRelation | CodeCode Available | 0 | 5 |
| SoccerDB: A Large-Scale Database for Comprehensive Video Understanding | Dec 10, 2019 | Action ClassificationAction Detection | CodeCode Available | 0 | 5 |
| Pooled Motion Features for First-Person Videos | Dec 19, 2014 | Activity RecognitionActivity Recognition In Videos | CodeCode Available | 0 | 5 |
| A Coding Framework and Benchmark towards Low-Bitrate Video Understanding | Feb 6, 2022 | Video CompressionVideo Understanding | CodeCode Available | 0 | 5 |
| FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks | Mar 24, 2022 | Action RecognitionRetrieval | CodeCode Available | 0 | 5 |
| ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment | Jun 28, 2025 | Dynamic Time WarpingLarge Language Model | CodeCode Available | 0 | 5 |
| Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark | Sep 23, 2021 | Video Understanding | CodeCode Available | 0 | 5 |
| FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework | Apr 9, 2021 | Language ModellingMultiple-choice | CodeCode Available | 0 | 5 |
| AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding | Jun 16, 2025 | Optical Character Recognition (OCR)RAG | CodeCode Available | 0 | 5 |
| On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis | Mar 15, 2022 | Video Understanding | CodeCode Available | 0 | 5 |
| Few-Shot Referring Relationships in Videos | Jan 1, 2023 | ObjectRelation Network | CodeCode Available | 0 | 5 |
| Features Understanding in 3D CNNs for Actions Recognition in Video | Oct 1, 2020 | Action RecognitionDecision Making | CodeCode Available | 0 | 5 |
| A Context-Aware Loss Function for Action Spotting in Soccer Videos | Dec 3, 2019 | Action SpottingVideo Understanding | CodeCode Available | 0 | 5 |
| OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions | Nov 24, 2024 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| Spatio-Temporal Perturbations for Video Attribution | Sep 1, 2021 | Video Understanding | CodeCode Available | 0 | 5 |
| AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures | May 30, 2019 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification | Nov 12, 2018 | Efficient Neural NetworkGeneral Classification | CodeCode Available | 0 | 5 |
| Exploring Temporal Information for Improved Video Understanding | May 25, 2019 | Action RecognitionOptical Flow Estimation | CodeCode Available | 0 | 5 |
| Multimodal Dialogue State Tracking | Jun 16, 2022 | Dialogue State TrackingVideo Understanding | CodeCode Available | 0 | 5 |
| Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs | Dec 18, 2021 | Graph GenerationObject | CodeCode Available | 0 | 5 |