| Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges | Jul 2, 2025 | Video Understanding | —Unverified | 0 |
| Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling | Sep 21, 2018 | General ClassificationVideo Classification | —Unverified | 0 |
| Large Scale Video Representation Learning via Relational Graph Clustering | Jun 1, 2020 | ClusteringGraph Clustering | —Unverified | 0 |
| Large-Scale YouTube-8M Video Understanding with Deep Neural Networks | Jun 14, 2017 | ClassificationGeneral Classification | —Unverified | 0 |
| LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision | Apr 15, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection | Aug 8, 2021 | Action DetectionKnowledge Distillation | —Unverified | 0 |
| Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval | Apr 3, 2025 | Information RetrievalRepresentation Learning | —Unverified | 0 |
| Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer | Sep 19, 2023 | AnatomyComputational Efficiency | —Unverified | 0 |
| Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking | Jun 7, 2021 | Graph Neural NetworkMulti-Person Pose Estimation | —Unverified | 0 |
| Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment | Jun 8, 2023 | Video Understanding | —Unverified | 0 |
| Learning from Multiple Sources for Video Summarisation | Jan 13, 2015 | ClusteringVideo Understanding | —Unverified | 0 |
| Learning Higher-order Object Interactions for Keypoint-based Video Understanding | May 16, 2023 | Action LocalizationAction Recognition | —Unverified | 0 |
| Learning Object State Changes in Videos: An Open-World Perspective | Dec 19, 2023 | Video Understanding | —Unverified | 0 |
| Learning reusable concepts across different egocentric video understanding tasks | May 30, 2025 | Video Understanding | —Unverified | 0 |
| Learning Space-Time Semantic Correspondences | Jun 16, 2023 | Imitation LearningSemantic correspondence | —Unverified | 0 |
| Learning text-to-video retrieval from image captioning | Apr 26, 2024 | Image CaptioningImage Retrieval | —Unverified | 0 |
| Learning to Focus on the Foreground for Temporal Sentence Grounding | Oct 1, 2022 | SentenceTemporal Sentence Grounding | —Unverified | 0 |
| Learning to Visually Connect Actions and their Effects | Jan 19, 2024 | Object TrackingTask Planning | —Unverified | 0 |
| Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition | Jun 14, 2017 | Action RecognitionOptical Flow Estimation | —Unverified | 0 |
| Less than Few: Self-Shot Video Instance Segmentation | Apr 19, 2022 | Few-Shot LearningInstance Segmentation | —Unverified | 0 |
| Leveraging Foundation Models for Multimodal Graph-Based Action Recognition | May 21, 2025 | Action RecognitionGraph Attention | —Unverified | 0 |
| Leveraging Local Temporal Information for Multimodal Scene Classification | Oct 26, 2021 | ClassificationScene Classification | —Unverified | 0 |
| LIGAR: Lightweight General-purpose Action Recognition | Aug 30, 2021 | Action RecognitionGesture Recognition | —Unverified | 0 |
| LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | May 21, 2025 | Autonomous DrivingQuestion Answering | —Unverified | 0 |
| LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering | Nov 29, 2021 | DiversityQuestion Answering | —Unverified | 0 |
| LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs | Mar 14, 2025 | Video Understanding | —Unverified | 0 |
| LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding | Jan 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living | Jun 13, 2024 | BenchmarkingHuman-Object Interaction Detection | —Unverified | 0 |
| LLM4Brain: Training a Large Language Model for Brain Video Understanding | Sep 26, 2024 | Domain AdaptationLanguage Modeling | —Unverified | 0 |
| LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs | Feb 21, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Localizing Events in Videos with Multimodal Queries | Jun 14, 2024 | Natural Language QueriesVideo Understanding | —Unverified | 0 |
| Localizing Unseen Activities in Video via Image Query | Jun 28, 2019 | Action LocalizationVideo Understanding | —Unverified | 0 |
| Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding | Mar 17, 2025 | AttributeMME | —Unverified | 0 |
| Long Activity Video Understanding using Functional Object-Oriented Network | Jul 3, 2018 | ObjectVideo Understanding | —Unverified | 0 |
| LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models | Feb 21, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Long-Short Temporal Contrastive Learning of Video Transformers | Jun 17, 2021 | Action RecognitionContrastive Learning | —Unverified | 0 |
| LongVILA: Scaling Long-Context Visual Language Models for Long Videos | Aug 19, 2024 | Video CaptioningVideo Question Answering | —Unverified | 0 |
| LongViTU: Instruction Tuning for Long-Form Video Understanding | Jan 9, 2025 | EgoSchemaForm | —Unverified | 0 |
| Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory | Mar 17, 2025 | FormGPU | —Unverified | 0 |
| Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing | Nov 29, 2024 | AllForm | —Unverified | 0 |
| Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization | Mar 28, 2021 | Action ClassificationAction Localization | —Unverified | 0 |
| LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents | Mar 13, 2025 | Computational EfficiencyOptical Character Recognition (OCR) | —Unverified | 0 |
| LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models | Feb 4, 2025 | GPUVideo Understanding | —Unverified | 0 |
| M^33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding | Sep 26, 2023 | 2D Semantic SegmentationAction Detection | —Unverified | 0 |
| M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition | Aug 6, 2023 | Action RecognitionDecision Making | —Unverified | 0 |
| MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection | May 29, 2025 | image-classificationImage Classification | —Unverified | 0 |
| Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling | Oct 19, 2024 | Video Understanding | —Unverified | 0 |
| MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | May 23, 2024 | Action RecognitionAction Segmentation | —Unverified | 0 |
| MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models | Jun 16, 2025 | Video Understanding | —Unverified | 0 |
| MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations | Mar 20, 2025 | HallucinationVideo Understanding | —Unverified | 0 |