| 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | Mar 22, 2025 | BenchmarkingObject | CodeCode Available | 0 |
| Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation | May 6, 2024 | Action SegmentationSkeleton Based Action Segmentation | CodeCode Available | 0 |
| Features Understanding in 3D CNNs for Actions Recognition in Video | Oct 1, 2020 | Action RecognitionDecision Making | CodeCode Available | 0 |
| Situational Scene Graph for Structured Human-centric Situation Understanding | Oct 30, 2024 | Graph GenerationPredicate Classification | CodeCode Available | 0 |
| Exploring Temporal Information for Improved Video Understanding | May 25, 2019 | Action RecognitionOptical Flow Estimation | CodeCode Available | 0 |
| SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding | Apr 30, 2025 | Video Understanding | CodeCode Available | 0 |
| ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding | Oct 1, 2024 | Contrastive LearningHallucination | CodeCode Available | 0 |
| Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs | Dec 18, 2021 | Graph GenerationObject | CodeCode Available | 0 |
| Screencast Tutorial Video Understanding | Jun 1, 2020 | object-detectionObject Detection | CodeCode Available | 0 |
| Video Object Segmentation using Supervoxel-Based Gerrymandering | Apr 18, 2017 | ObjectSemantic Segmentation | CodeCode Available | 0 |
| ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding | May 29, 2025 | AvgVideo Understanding | CodeCode Available | 0 |
| Representation Flow for Action Recognition | Oct 2, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition | Mar 30, 2017 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Relation-aware Hierarchical Attention Framework for Video Question Answering | May 13, 2021 | Question AnsweringRelation | CodeCode Available | 0 |
| Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action Recognition | Nov 1, 2021 | Action RecognitionPerson Re-Identification | CodeCode Available | 0 |
| Recurrent Space-time Graph Neural Networks | Apr 11, 2019 | Action RecognitionHuman-Object Interaction Detection | CodeCode Available | 0 |
| TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos | May 26, 2025 | AttributeVideo Understanding | CodeCode Available | 0 |
| ACVUBench: Audio-Centric Video Understanding Benchmark | Mar 25, 2025 | Video Understanding | CodeCode Available | 0 |
| AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures | May 30, 2019 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Win-Fail Action Recognition | Feb 15, 2021 | Action RecognitionAction Understanding | CodeCode Available | 0 |
| VideoQA in the Era of LLMs: An Empirical Study | Aug 8, 2024 | Multimodal Large Language ModelVideo Question Answering | CodeCode Available | 0 |
| UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark | Oct 2, 2024 | Unusual Activity LocalizationVideo Understanding | CodeCode Available | 0 |
| ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence Alignment | Jun 28, 2025 | Dynamic Time WarpingLarge Language Model | CodeCode Available | 0 |
| EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization | Jun 17, 2025 | Multi-Instance RetrievalRetrieval | CodeCode Available | 0 |
| Enhancing Temporal Modeling of Video LLMs via Time Gating | Oct 8, 2024 | MVBenchQuestion Answering | CodeCode Available | 0 |