| StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models | Aug 31, 2024 | Video Understanding | —Unverified | 0 | 0 |
| STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition | Jan 8, 2023 | Action RecognitionFacial Expression Recognition (FER) | —Unverified | 0 | 0 |
| StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant | May 8, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Streaming Long Video Understanding with Large Language Models | May 25, 2024 | Question AnsweringVideo Understanding | —Unverified | 0 | 0 |
| Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring | Aug 31, 2024 | Disaster ResponseVideo Understanding | —Unverified | 0 | 0 |
| Students taught by multimodal teachers are superior action recognizers | Oct 9, 2022 | Action RecognitionKnowledge Distillation | —Unverified | 0 | 0 |
| Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding | Jun 9, 2025 | Contrastive LearningVideo Editing | —Unverified | 0 | 0 |
| SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis | Jun 9, 2025 | Action ClassificationBenchmarking | —Unverified | 0 | 0 |
| SVGraph: Learning Semantic Graphs from Instructional Videos | Jul 16, 2022 | Graph LearningVideo Understanding | —Unverified | 0 | 0 |
| SVT: Supertoken Video Transformer for Efficient Video Understanding | Apr 1, 2023 | Video Understanding | —Unverified | 0 | 0 |
| Dynamics Based Neural Encoding with Inter-Intra Region Connectivity | Feb 19, 2024 | Video Understanding | —Unverified | 0 | 0 |
| System-status-aware Adaptive Network for Online Streaming Video Understanding | Mar 28, 2023 | Streaming video understandingVideo Understanding | —Unverified | 0 | 0 |
| TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations | Sep 5, 2024 | Causal InferencePosition | —Unverified | 0 | 0 |
| Teaching Machines to Understand Baseball Games: Large-Scale Baseball Video Database for Multiple Video Understanding Tasks | Sep 1, 2018 | Video AlignmentVideo Recognition | —Unverified | 0 | 0 |
| Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks | Sep 27, 2024 | Action DetectionAction Segmentation | —Unverified | 0 | 0 |
| Temporal Action Detection Model Compression by Progressive Block Drop | Mar 21, 2025 | Action DetectionAutonomous Driving | —Unverified | 0 | 0 |
| Temporal Grounding of Activities using Multimodal Large Language Models | May 30, 2024 | Video Understanding | —Unverified | 0 | 0 |
| Temporally-Adaptive Models for Efficient Video Understanding | Aug 10, 2023 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection | Mar 1, 2022 | AvgBoundary Detection | —Unverified | 0 | 0 |
| Temporal Preference Optimization for Long-Form Video Understanding | Jan 23, 2025 | FormMME | —Unverified | 0 | 0 |
| Temporal Query Networks for Fine-grained Video Understanding | Apr 19, 2021 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| t-EVA: Time-Efficient t-SNE Video Annotation | Nov 26, 2020 | Dimensionality ReductionVideo Classification | —Unverified | 0 | 0 |
| Text-Conditioned Resampler For Long Form Video Understanding | Dec 19, 2023 | EgoSchemaForm | —Unverified | 0 | 0 |
| TextVidBench: A Benchmark for Long Video Scene Text Understanding | Jun 5, 2025 | Prompt EngineeringQuestion Answering | —Unverified | 0 | 0 |
| The Open World of Micro-Videos | Mar 31, 2016 | DiversityTAG | —Unverified | 0 | 0 |
| Therbligs in Action: Video Understanding through Motion Primitives | Apr 6, 2023 | Action AnticipationAction Recognition | —Unverified | 0 | 0 |
| The THUMOS Challenge on Action Recognition for Videos "in the Wild" | Apr 21, 2016 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders | May 30, 2025 | Video Understanding | —Unverified | 0 | 0 |
| Time Blindness: Why Video-Language Models Can't See What Humans Can? | May 30, 2025 | Temporal SequencesVideo Understanding | —Unverified | 0 | 0 |
| TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding | Apr 2, 2025 | Video Understanding | —Unverified | 0 | 0 |
| TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation | Apr 24, 2025 | Caption GenerationDense Video Captioning | —Unverified | 0 | 0 |
| TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs | Mar 13, 2025 | BenchmarkingQuestion Answering | —Unverified | 0 | 0 |
| Toward a Human-Level Video Understanding Intelligence | Oct 8, 2021 | AI AgentVideo Understanding | —Unverified | 0 | 0 |
| Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder | Sep 20, 2024 | Activity RecognitionDiagnostic | —Unverified | 0 | 0 |
| Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking | Apr 11, 2025 | Moment RetrievalQuestion Answering | —Unverified | 0 | 0 |
| Towards Fine-Grained Video Question Answering | Mar 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset | Jun 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Towards Long Video Understanding via Fine-detailed Video Story Generation | Dec 9, 2024 | Story GenerationVideo Understanding | —Unverified | 0 | 0 |
| Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition | Mar 17, 2025 | Action RecognitionVideo Recognition | —Unverified | 0 | 0 |
| Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition | Jun 9, 2021 | Action RecognitionPoint Cloud Classification | —Unverified | 0 | 0 |
| Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection | Mar 5, 2025 | Anomaly DetectionObject | —Unverified | 0 | 0 |
| Transformed ROIs for Capturing Visual Transformations in Videos | Jun 6, 2021 | Action RecognitionVideo Understanding | —Unverified | 0 | 0 |
| Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images | Dec 7, 2022 | Building change detection for remote sensing imagesChange Detection | —Unverified | 0 | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 | 0 |
| TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning | Feb 29, 2024 | Question AnsweringVideo Understanding | —Unverified | 0 | 0 |
| Two Causally Related Needles in a Video Haystack | May 26, 2025 | Video UnderstandingVisual Grounding | —Unverified | 0 | 0 |
| Two-Stream Transformer Architecture for Long Video Understanding | Aug 2, 2022 | Action RecognitionGPU | —Unverified | 0 | 0 |
| UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection | Nov 29, 2021 | Boundary DetectionContrastive Learning | —Unverified | 0 | 0 |
| UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection | Jan 1, 2022 | Boundary DetectionContrastive Learning | —Unverified | 0 | 0 |