| AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Jan 1, 2025 | GPUQuestion Answering | —Unverified | 0 |
| Efficient Motion-Aware Video MLLM | Jan 1, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Flexible Frame Selection for Efficient Video Reasoning | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Hierarchical Banzhaf Interaction for General Video-Language Representation Learning | Dec 30, 2024 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries | Dec 26, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| VidCtx: Context-aware Video Question Answering with Image Models | Dec 23, 2024 | Large Language ModelQuestion Answering | CodeCode Available | 0 |
| FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos | Dec 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| PolySmart @ TRECVid 2024 Medical Video Question Answering | Dec 20, 2024 | Question AnsweringRetrieval | —Unverified | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 |
| IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs | Dec 13, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering | Dec 12, 2024 | feature selectionLanguage Modeling | —Unverified | 0 |
| Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | Dec 6, 2024 | document understandingHallucination | —Unverified | 0 |
| SEAL: Semantic Attention Learning for Long Video Representation | Dec 2, 2024 | DiversityQuestion Answering | —Unverified | 0 |
| Unlocking Video-LLM via Agent-of-Thoughts Distillation | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks | Dec 2, 2024 | Multi-Object TrackingObject Tracking | CodeCode Available | 0 |
| Actions and Objects Pathways for Domain Adaptation in Video Question Answering | Nov 29, 2024 | Domain AdaptationDomain Generalization | —Unverified | 0 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 |
| HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation | Nov 27, 2024 | Graph GenerationQuestion Answering | —Unverified | 0 |
| VideoOrion: Tokenizing Object Dynamics in Videos | Nov 25, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Nov 19, 2024 | GPUQuestion Answering | —Unverified | 0 |
| EVQAScore: Efficient Video Question Answering Data Evaluation | Nov 11, 2024 | Keyword ExtractionQuestion Answering | —Unverified | 0 |
| Poze: Sports Technique Feedback under Data Constraints | Nov 8, 2024 | Pose EstimationQuestion Answering | —Unverified | 0 |
| FLAASH: Flow-Attention Adaptive Semantic Hierarchical Fusion for Multi-Modal Tobacco Content Analysis | Oct 25, 2024 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| GPT-4o System Card | Oct 25, 2024 | Multiple-choiceSpatial Reasoning | —Unverified | 0 |