| Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations | Mar 25, 2025 | Representation LearningVideo Understanding | CodeCode Available | 0 | 5 |
| ECO: Efficient Convolutional Network for Online Video Understanding | Apr 24, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| (Un)likelihood Training for Interpretable Embedding | Jul 1, 2022 | Ad-hoc video searchDecoder | CodeCode Available | 0 | 5 |
| Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images | Oct 4, 2018 | Domain AdaptationImage-to-Image Translation | CodeCode Available | 0 | 5 |
| UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark | Oct 2, 2024 | Unusual Activity LocalizationVideo Understanding | CodeCode Available | 0 | 5 |
| ACVUBench: Audio-Centric Video Understanding Benchmark | Mar 25, 2025 | Video Understanding | CodeCode Available | 0 | 5 |
| TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos | May 26, 2025 | AttributeVideo Understanding | CodeCode Available | 0 | 5 |
| TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition | Mar 30, 2017 | Action ClassificationAction Recognition | CodeCode Available | 0 | 5 |
| Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Jun 15, 2024 | Question AnsweringVideo Understanding | CodeCode Available | 0 | 5 |
| Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTube | Apr 29, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture | Apr 18, 2020 | Anomaly DetectionClassification | CodeCode Available | 0 | 5 |
| DramaQA: Character-Centered Video Story Understanding with Hierarchical QA | May 7, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 | 5 |
| Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning | Jan 8, 2024 | object-detectionObject Detection | CodeCode Available | 0 | 5 |
| Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality | Mar 28, 2024 | Data AugmentationDiversity | CodeCode Available | 0 | 5 |
| Don't Judge by the Look: Towards Motion Coherent Video Representation | Mar 14, 2024 | Data AugmentationObject Recognition | CodeCode Available | 0 | 5 |
| Tiny Video Networks | Oct 15, 2019 | CPUGPU | CodeCode Available | 0 | 5 |
| The YouTube-8M Kaggle Competition: Challenges and Methods | Jun 28, 2017 | General ClassificationVideo Classification | CodeCode Available | 0 | 5 |
| The Visual Centrifuge: Model-Free Layered Video Representations | Dec 4, 2018 | Color Constancymodel | CodeCode Available | 0 | 5 |
| Temporal Tessellation: A Unified Approach for Video Analysis | Dec 21, 2016 | Action DetectionVideo Captioning | CodeCode Available | 0 | 5 |
| The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge | Jun 16, 2017 | General ClassificationVideo Classification | CodeCode Available | 0 | 5 |
| In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Apr 14, 2024 | Action RecognitionHand Pose Estimation | CodeCode Available | 0 | 5 |
| Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding | Jul 14, 2017 | Video RecognitionVideo Understanding | CodeCode Available | 0 | 5 |
| Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Temporally smooth online action detection using cycle-consistent future anticipation | Apr 16, 2021 | Action DetectionAutonomous Driving | CodeCode Available | 0 | 5 |
| Temporal Action Proposal Generation With Action Frequency Adaptive Network | Jun 23, 2023 | Knowledge DistillationTemporal Action Proposal Generation | CodeCode Available | 0 | 5 |
| A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot | May 16, 2023 | Emotion ClassificationQuestion Answering | CodeCode Available | 0 | 5 |
| Diagnosing Error in Temporal Action Detectors | Jul 27, 2018 | Action LocalizationDiagnostic | CodeCode Available | 0 | 5 |
| Telling Stories for Common Sense Zero-Shot Action Recognition | Sep 29, 2023 | Action RecognitionArticles | CodeCode Available | 0 | 5 |
| ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models | Jun 28, 2023 | RetrievalVideo Retrieval | CodeCode Available | 0 | 5 |
| Technical Report for CVPR 2022 LOVEU AQTC Challenge | Jun 29, 2022 | Video Understanding | CodeCode Available | 0 | 5 |
| 4D Generic Video Object Proposals | Jan 26, 2019 | Instance SegmentationObject | CodeCode Available | 0 | 5 |
| Detection-Fusion for Knowledge Graph Extraction from Videos | Dec 30, 2024 | Knowledge GraphsLanguage Modeling | CodeCode Available | 0 | 5 |
| https://arxiv.org/abs/2407.00634 | Jul 2, 2024 | Video CaptioningVideo Description | CodeCode Available | 0 | 5 |
| Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning | Jun 1, 2023 | Incremental LearningKnowledge Distillation | CodeCode Available | 0 | 5 |
| How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios | Oct 18, 2022 | Video Understanding | CodeCode Available | 0 | 5 |
| Detect-and-Track: Efficient Pose Estimation in Videos | Dec 26, 2017 | Human DetectionKeypoint Estimation | CodeCode Available | 0 | 5 |
| Task-Aware KV Compression For Cost-Effective Long Video Understanding | Jun 26, 2025 | Video Understanding | CodeCode Available | 0 | 5 |
| HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios | Jun 11, 2025 | Action RecognitionAction Segmentation | CodeCode Available | 0 | 5 |
| Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach | Jun 30, 2022 | Boundary DetectionGeneric Event Boundary Detection | CodeCode Available | 0 | 5 |
| HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding | Jan 3, 2025 | Question AnsweringVideo Understanding | CodeCode Available | 0 | 5 |
| Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning | Dec 7, 2021 | Contrastive LearningRepresentation Learning | CodeCode Available | 0 | 5 |
| Deep Learning Methods for Efficient Large Scale Video Labeling | Jun 14, 2017 | Deep LearningVideo Understanding | CodeCode Available | 0 | 5 |
| Hierarchical Deep Recurrent Architecture for Video Understanding | Jul 11, 2017 | ClassificationGeneral Classification | CodeCode Available | 0 | 5 |
| Streaming Detection of Queried Event Start | Dec 4, 2024 | Autonomous Drivingparameter-efficient fine-tuning | CodeCode Available | 0 | 5 |
| Video action detection by learning graph-based spatio-temporal interactions | Dec 9, 2019 | Action DetectionAction Localization | CodeCode Available | 0 | 5 |
| HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding | Jul 9, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 0 | 5 |
| Spatio-Temporal Perturbations for Video Attribution | Sep 1, 2021 | Video Understanding | CodeCode Available | 0 | 5 |
| Hallucination Mitigation Prompts Long-term Video Understanding | Jun 17, 2024 | Answer GenerationHallucination | CodeCode Available | 0 | 5 |
| SoccerNet 2024 Challenges Results | Sep 16, 2024 | Action SpottingDense Video Captioning | CodeCode Available | 0 | 5 |
| Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation | May 6, 2024 | Action SegmentationSkeleton Based Action Segmentation | CodeCode Available | 0 | 5 |