| CAST: Cross-Attention in Space and Time for Video Action Recognition | Nov 30, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties | Nov 28, 2023 | In-Context LearningVideo Understanding | CodeCode Available | 1 |
| Panoptic Video Scene Graph Generation | Nov 28, 2023 | Graph GenerationPanoptic Scene Graph Generation | CodeCode Available | 1 |
| Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning | Nov 27, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding | Nov 25, 2023 | Video Understanding | CodeCode Available | 1 |
| MM-VID: Advancing Video Understanding with GPT-4V(ision) | Oct 30, 2023 | Script GenerationVideo Understanding | CodeCode Available | 1 |
| BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning | Sep 27, 2023 | GPUVideo-based Generative Performance Benchmarking | CodeCode Available | 1 |
| End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning | Sep 27, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 1 |
| SoccerNet 2023 Challenges Results | Sep 12, 2023 | Action SpottingCamera Calibration | CodeCode Available | 1 |
| CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction | Aug 29, 2023 | Federated Learningimage-classification | CodeCode Available | 1 |
| Spherical Vision Transformer for 360-degree Video Saliency Prediction | Aug 24, 2023 | PredictionSaliency Prediction | CodeCode Available | 1 |
| Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos | Aug 18, 2023 | point cloud video understandingSelf-Supervised Learning | CodeCode Available | 1 |
| EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding | Aug 17, 2023 | DiagnosticEgoSchema | CodeCode Available | 1 |
| Helping Hands: An Object-Aware Ego-Centric Video Recognition Model | Aug 15, 2023 | DecoderObject | CodeCode Available | 1 |
| Multimodal Distillation for Egocentric Action Recognition | Jul 14, 2023 | Action RecognitionKnowledge Distillation | CodeCode Available | 1 |
| Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models | Jul 9, 2023 | Question AnsweringTGIF-Frame | CodeCode Available | 1 |
| An overview on the evaluated video retrieval tasks at TRECVID 2022 | Jun 22, 2023 | Ad-hoc video searchRetrieval | CodeCode Available | 1 |
| Multi-Granularity Hand Action Detection | Jun 19, 2023 | Action DetectionAction Localization | CodeCode Available | 1 |
| EPIC Fields: Marrying 3D Geometry and Video Understanding | Jun 14, 2023 | 3D geometryNeural Rendering | CodeCode Available | 1 |
| VideoLLM: Modeling Video Sequence with Large Language Models | May 22, 2023 | DecoderVideo Understanding | CodeCode Available | 1 |
| Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach | May 10, 2023 | Autonomous VehiclesMonocular Visual Odometry | CodeCode Available | 1 |
| MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer | Apr 29, 2023 | DecoderHighlight Detection | CodeCode Available | 1 |
| Event-Free Moving Object Segmentation from Moving Ego Vehicle | Apr 28, 2023 | Autonomous DrivingBenchmarking | CodeCode Available | 1 |
| Leveraging triplet loss for unsupervised action segmentation | Apr 13, 2023 | Action SegmentationClustering | CodeCode Available | 1 |
| Procedure-Aware Pretraining for Instructional Video Understanding | Mar 31, 2023 | Video Understanding | CodeCode Available | 1 |