| Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements | Jun 11, 2025 | Temporal Localization | —Unverified | 0 |
| VideoMolmo: Spatio-Temporal Grounding Meets Pointing | Jun 5, 2025 | Autonomous DrivingAutonomous Navigation | CodeCode Available | 2 |
| DisTime: Distribution-based Time Representation for Video Large Language Models | May 30, 2025 | Temporal LocalizationVideo Understanding | CodeCode Available | 1 |
| Transforming faces into video stories -- VideoFace2.0 | May 4, 2025 | Face DetectionFace Recognition | CodeCode Available | 0 |
| MINERVA: Evaluating Complex Video Reasoning | May 1, 2025 | BenchmarkingTemporal Localization | CodeCode Available | 2 |
| Hierarchical and Multimodal Data for Daily Activity Understanding | Apr 24, 2025 | Action Anticipationcounterfactual | CodeCode Available | 0 |
| TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation | Apr 24, 2025 | Caption GenerationDense Video Captioning | —Unverified | 0 |
| A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports | Apr 15, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Crash Time Matters: HybridMamba for Fine-Grained Temporal Localization in Traffic Surveillance Footage | Apr 4, 2025 | Temporal Localization | —Unverified | 0 |
| SocialGesture: Delving into Multi-person Gesture Understanding | Apr 3, 2025 | Gesture RecognitionQuestion Answering | —Unverified | 0 |