| Spatio-Temporal Video Representation Learning for AI Based Video Playback Style Prediction | Oct 3, 2021 | Action RecognitionRepresentation Learning | —Unverified | 0 |
| OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION | Sep 29, 2021 | ObjectPredict Future Video Frames | —Unverified | 0 |
| Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark | Sep 23, 2021 | Video Understanding | CodeCode Available | 0 |
| A Multimodal Sentiment Dataset for Video Recommendation | Sep 17, 2021 | Multimodal Sentiment AnalysisSentiment Analysis | —Unverified | 0 |
| Overview of Tencent Multi-modal Ads Video Understanding Challenge | Sep 16, 2021 | Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION | —Unverified | 0 |
| Multi-modal Representation Learning for Video Advertisement Content Structuring | Sep 4, 2021 | Representation LearningRe-Ranking | —Unverified | 0 |
| Spatio-Temporal Perturbations for Video Attribution | Sep 1, 2021 | Video Understanding | CodeCode Available | 0 |
| LIGAR: Lightweight General-purpose Action Recognition | Aug 30, 2021 | Action RecognitionGesture Recognition | —Unverified | 0 |
| Identity-aware Graph Memory Network for Action Detection | Aug 26, 2021 | Action DetectionGraph Neural Network | —Unverified | 0 |
| Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection | Aug 8, 2021 | Action DetectionKnowledge Distillation | —Unverified | 0 |
| O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning | Aug 5, 2021 | AttributeCaption Generation | —Unverified | 0 |
| CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story Understanding | Jul 21, 2021 | Question AnsweringSentence | —Unverified | 0 |
| Spatio-Temporal Context for Action Detection | Jun 29, 2021 | Action DetectionVideo Understanding | —Unverified | 0 |
| Discerning Generic Event Boundaries in Long-Form Wild Videos | Jun 18, 2021 | Boundary DetectionForm | —Unverified | 0 |
| Long-Short Temporal Contrastive Learning of Video Transformers | Jun 17, 2021 | Action RecognitionContrastive Learning | —Unverified | 0 |
| C^3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues | Jun 16, 2021 | Contrastive Learningcounterfactual | —Unverified | 0 |
| Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition | Jun 9, 2021 | Action RecognitionPoint Cloud Classification | —Unverified | 0 |
| Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking | Jun 7, 2021 | Graph Neural NetworkMulti-Person Pose Estimation | —Unverified | 0 |
| Transformed ROIs for Capturing Visual Transformations in Videos | Jun 6, 2021 | Action RecognitionVideo Understanding | —Unverified | 0 |
| A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP | May 31, 2021 | Action RecognitionSpatio-temporal Action Recognition | —Unverified | 0 |
| Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis | May 28, 2021 | Multimodal Sentiment AnalysisObject Recognition | —Unverified | 0 |
| VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding | May 20, 2021 | Action SegmentationLanguage Modeling | —Unverified | 0 |
| Relation-aware Hierarchical Attention Framework for Video Question Answering | May 13, 2021 | Question AnsweringRelation | CodeCode Available | 0 |
| Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions | May 10, 2021 | Contrastive LearningRetrieval | —Unverified | 0 |
| Skimming and Scanning for Untrimmed Video Action Recognition | Apr 21, 2021 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| Camera Calibration and Player Localization in SoccerNet-v2 and Investigation of their Representations for Action Spotting | Apr 19, 2021 | Action SpottingCamera Calibration | —Unverified | 0 |
| Temporal Query Networks for Fine-grained Video Understanding | Apr 19, 2021 | Action ClassificationAction Recognition | —Unverified | 0 |
| Temporally smooth online action detection using cycle-consistent future anticipation | Apr 16, 2021 | Action DetectionAutonomous Driving | CodeCode Available | 0 |
| Adaptive Intermediate Representations for Video Understanding | Apr 14, 2021 | Action ClassificationOptical Flow Estimation | —Unverified | 0 |
| Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation | Apr 10, 2021 | Objectobject-detection | —Unverified | 0 |
| FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework | Apr 9, 2021 | Language ModellingMultiple-choice | CodeCode Available | 0 |
| M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers | Apr 2, 2021 | DiagnosticVideo Editing | —Unverified | 0 |
| Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation | Mar 30, 2021 | Action DetectionTemporal Action Proposal Generation | —Unverified | 0 |
| Unified Graph Structured Models for Video Understanding | Mar 29, 2021 | Action DetectionGraph Classification | —Unverified | 0 |
| Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization | Mar 28, 2021 | Action ClassificationAction Localization | —Unverified | 0 |
| ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation | Mar 19, 2021 | ObjectReferring Expression Segmentation | —Unverified | 0 |
| Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training | Mar 18, 2021 | Video Understanding | —Unverified | 0 |
| PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization | Mar 9, 2021 | Action LocalizationBoundary Detection | —Unverified | 0 |
| Unsupervised Motion Representation Enhanced Network for Action Recognition | Mar 5, 2021 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Win-Fail Action Recognition | Feb 15, 2021 | Action RecognitionAction Understanding | CodeCode Available | 0 |
| CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization | Jan 1, 2021 | Action LocalizationImitation Learning | —Unverified | 0 |
| Global Self-Attention Networks | Jan 1, 2021 | Video Understanding | —Unverified | 0 |
| Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization | Jan 1, 2021 | Action LocalizationVideo Understanding | —Unverified | 0 |
| Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion | Jan 1, 2021 | Time SeriesTime Series Analysis | —Unverified | 0 |
| Understanding Action Sequences based on Video Captioning for Learning-from-Observation | Dec 9, 2020 | Video CaptioningVideo Understanding | —Unverified | 0 |
| t-EVA: Time-Efficient t-SNE Video Annotation | Nov 26, 2020 | Dimensionality ReductionVideo Classification | —Unverified | 0 |
| Can Temporal Information Help with Contrastive Self-Supervised Learning? | Nov 25, 2020 | Data AugmentationRepresentation Learning | —Unverified | 0 |
| Cycle-Contrast for Self-Supervised Video Representation Learning | Oct 28, 2020 | Action RecognitionContrastive Learning | —Unverified | 0 |
| Co-attentional Transformers for Story-Based Video Understanding | Oct 27, 2020 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset | Oct 15, 2020 | Activity RecognitionEgocentric Activity Recognition | —Unverified | 0 |