| Judging a video by its bitstream cover | Sep 14, 2023 | Video Understanding | CodeCode Available | 0 |
| SoccerNet 2023 Challenges Results | Sep 12, 2023 | Action SpottingCamera Calibration | CodeCode Available | 1 |
| CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction | Aug 29, 2023 | Federated Learningimage-classification | CodeCode Available | 1 |
| Spherical Vision Transformer for 360-degree Video Saliency Prediction | Aug 24, 2023 | PredictionSaliency Prediction | CodeCode Available | 1 |
| Motion-Guided Masking for Spatiotemporal Representation Learning | Aug 24, 2023 | Domain AdaptationRepresentation Learning | —Unverified | 0 |
| MOFO: MOtion FOcused Self-Supervision for Video Understanding | Aug 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Are current long-term video understanding datasets long-term? | Aug 22, 2023 | Action RecognitionVideo Understanding | CodeCode Available | 0 |
| Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos | Aug 18, 2023 | point cloud video understandingSelf-Supervised Learning | CodeCode Available | 1 |
| Audio-Visual Glance Network for Efficient Video Recognition | Aug 18, 2023 | Video RecognitionVideo Understanding | —Unverified | 0 |
| EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding | Aug 17, 2023 | DiagnosticEgoSchema | CodeCode Available | 1 |
| Helping Hands: An Object-Aware Ego-Centric Video Recognition Model | Aug 15, 2023 | DecoderObject | CodeCode Available | 1 |
| Temporally-Adaptive Models for Efficient Video Understanding | Aug 10, 2023 | Action ClassificationAction Recognition | —Unverified | 0 |
| M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition | Aug 6, 2023 | Action RecognitionDecision Making | —Unverified | 0 |
| MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | Jul 31, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 2 |
| DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation | Jul 31, 2023 | Action SegmentationHuman-Object Interaction Detection | —Unverified | 0 |
| A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future | Jul 18, 2023 | Knowledge Distillationobject-detection | CodeCode Available | 2 |
| Multimodal Distillation for Egocentric Action Recognition | Jul 14, 2023 | Action RecognitionKnowledge Distillation | CodeCode Available | 1 |
| InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation | Jul 13, 2023 | Action RecognitionContrastive Learning | —Unverified | 0 |
| HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding | Jul 9, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 0 |
| Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models | Jul 9, 2023 | Question AnsweringTGIF-Frame | CodeCode Available | 1 |
| VideoGLUE: Video General Understanding Evaluation of Foundation Models | Jul 6, 2023 | Action RecognitionTemporal Localization | —Unverified | 0 |
| ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models | Jun 28, 2023 | RetrievalVideo Retrieval | CodeCode Available | 0 |
| Temporal Action Proposal Generation With Action Frequency Adaptive Network | Jun 23, 2023 | Knowledge DistillationTemporal Action Proposal Generation | CodeCode Available | 0 |
| An overview on the evaluated video retrieval tasks at TRECVID 2022 | Jun 22, 2023 | Ad-hoc video searchRetrieval | CodeCode Available | 1 |
| Multi-Granularity Hand Action Detection | Jun 19, 2023 | Action DetectionAction Localization | CodeCode Available | 1 |