| Temporal Action Segmentation: An Analysis of Modern Techniques | Oct 19, 2022 | Action SegmentationSegmentation | CodeCode Available | 2 |
| How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios | Oct 18, 2022 | Video Understanding | CodeCode Available | 0 |
| Self-supervised video pretraining yields robust and more human-aligned visual representations | Oct 12, 2022 | Contrastive Learningobject-detection | —Unverified | 0 |
| Students taught by multimodal teachers are superior action recognizers | Oct 9, 2022 | Action RecognitionKnowledge Distillation | —Unverified | 0 |
| EgoTaskQA: Understanding Human Tasks in Egocentric Videos | Oct 8, 2022 | Action Localizationcounterfactual | CodeCode Available | 1 |
| Compressed Vision for Efficient Video Understanding | Oct 6, 2022 | Video CompressionVideo Understanding | —Unverified | 0 |
| SoccerNet 2022 Challenges Results | Oct 5, 2022 | Action SpottingCamera Calibration | CodeCode Available | 1 |
| Learning to Focus on the Foreground for Temporal Sentence Grounding | Oct 1, 2022 | SentenceTemporal Sentence Grounding | —Unverified | 0 |
| In-the-Wild Video Question Answering | Oct 1, 2022 | Evidence SelectionQuestion Answering | —Unverified | 0 |
| Learning Transferable Spatiotemporal Representations from Natural Script Knowledge | Sep 30, 2022 | DescriptiveRepresentation Learning | CodeCode Available | 1 |
| Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain | Sep 29, 2022 | Action RecognitionVideo Understanding | —Unverified | 0 |
| Streaming Video Temporal Action Segmentation In Real Time | Sep 28, 2022 | Action SegmentationLanguage Modelling | CodeCode Available | 1 |
| AVT: Audio-Video Transformer for Multimodal Action Recognition | Sep 22, 2022 | Action RecognitionAudio Classification | —Unverified | 0 |
| UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer | Sep 22, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| Panoramic Vision Transformer for Saliency Detection in 360° Videos | Sep 19, 2022 | Saliency DetectionSaliency Prediction | CodeCode Available | 1 |
| WildQA: In-the-Wild Video Question Answering | Sep 14, 2022 | Evidence SelectionQuestion Answering | —Unverified | 0 |
| EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography | Sep 9, 2022 | Video Understanding | CodeCode Available | 1 |
| Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions | Sep 7, 2022 | Image GenerationText to Image Generation | —Unverified | 0 |
| Visual Subtitle Feature Enhanced Video Outline Generation | Aug 24, 2022 | ArticlesHeadline Generation | —Unverified | 0 |
| Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding | Aug 22, 2022 | Action RecognitionMulti-Task Learning | —Unverified | 0 |
| DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations | Aug 17, 2022 | Camera CalibrationInstance Segmentation | CodeCode Available | 1 |
| Motion Sensitive Contrastive Learning for Self-supervised Video Representation | Aug 12, 2022 | Contrastive LearningRepresentation Learning | —Unverified | 0 |
| Exploring Anchor-based Detection for Ego4D Natural Language Query | Aug 10, 2022 | Video Understanding | —Unverified | 0 |
| SA-NET.v2: Real-time vehicle detection from oblique UAV images with use of uncertainty estimation in deep meta-learning | Aug 4, 2022 | Meta-LearningSemantic Segmentation | —Unverified | 0 |
| Two-Stream Transformer Architecture for Long Video Understanding | Aug 2, 2022 | Action RecognitionGPU | —Unverified | 0 |
| BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation | Aug 1, 2022 | ObjectOptical Flow Estimation | —Unverified | 0 |
| Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding | Jul 30, 2022 | point cloud video understandingVideo Understanding | CodeCode Available | 1 |
| Static and Dynamic Concepts for Self-supervised Video Representation Learning | Jul 26, 2022 | DiversityRepresentation Learning | CodeCode Available | 1 |
| EgoEnv: Human-centric environment representations from egocentric video | Jul 22, 2022 | Video Understanding | —Unverified | 0 |
| Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022 | Jul 22, 2022 | ObjectObject State Change Classification | —Unverified | 0 |
| AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding | Jul 21, 2022 | Action RecognitionVideo Understanding | —Unverified | 0 |
| An Efficient Spatio-Temporal Pyramid Transformer for Action Detection | Jul 21, 2022 | Action DetectionVideo Understanding | —Unverified | 0 |
| Spotting Temporally Precise, Fine-Grained Events in Video | Jul 20, 2022 | Action DetectionAction Spotting | CodeCode Available | 1 |
| Clover: Towards A Unified Video-Language Alignment and Fusion Model | Jul 16, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SVGraph: Learning Semantic Graphs from Instructional Videos | Jul 16, 2022 | Graph LearningVideo Understanding | —Unverified | 0 |
| Is Appearance Free Action Recognition Possible? | Jul 13, 2022 | Action RecognitionOptical Flow Estimation | CodeCode Available | 1 |
| Federated Self-supervised Learning for Video Understanding | Jul 5, 2022 | Action RecognitionFederated Learning | CodeCode Available | 1 |
| GraphVid: It Only Takes a Few Nodes to Understand a Video | Jul 4, 2022 | SuperpixelsVideo Understanding | —Unverified | 0 |
| Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering | Jul 1, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Multimodal Intent Discovery from Livestream Videos | Jul 1, 2022 | Intent DiscoveryVideo Summarization | —Unverified | 0 |
| (Un)likelihood Training for Interpretable Embedding | Jul 1, 2022 | Ad-hoc video searchDecoder | CodeCode Available | 0 |
| Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach | Jun 30, 2022 | Boundary DetectionGeneric Event Boundary Detection | CodeCode Available | 0 |
| Technical Report for CVPR 2022 LOVEU AQTC Challenge | Jun 29, 2022 | Video Understanding | CodeCode Available | 0 |
| ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning | Jun 27, 2022 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| REVECA -- Rich Encoder-decoder framework for Video Event CAptioner | Jun 18, 2022 | DecoderSemantic Segmentation | CodeCode Available | 1 |
| Multimodal Dialogue State Tracking | Jun 16, 2022 | Dialogue State TrackingVideo Understanding | CodeCode Available | 0 |
| Stand-Alone Inter-Frame Attention in Video Models | Jun 14, 2022 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens | Jun 13, 2022 | Action RecognitionVideo Understanding | —Unverified | 0 |
| A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector | Jun 7, 2022 | Action ClassificationAction Detection | CodeCode Available | 1 |
| Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey | Jun 5, 2022 | 3D Hand Pose EstimationDomain Adaptation | —Unverified | 0 |