| ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation | Jan 11, 2025 | Chart UnderstandingCode Generation | CodeCode Available | 2 |
| UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation | Jan 9, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 2 |
| OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | Jan 8, 2025 | DecoderEmotional Speech Synthesis | CodeCode Available | 2 |
| Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers | Jan 4, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Metadata Conditioning Accelerates Language Model Pre-training | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| FLAME: Financial Large-Language Model Assessment and Metrics Evaluation | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | Jan 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| TrustRAG: Enhancing Robustness and Trustworthiness in RAG | Jan 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Dual Diffusion for Unified Image Generation and Understanding | Dec 31, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 2 |
| Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Dec 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Long-Form Speech Generation with Spoken Language Models | Dec 24, 2024 | FormLanguage Modeling | CodeCode Available | 2 |
| Large Language Model Safety: A Holistic Survey | Dec 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Large Language Model Enhanced Recommender Systems: A Survey | Dec 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data | Dec 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Phi-4 Technical Report | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Granite Guardian | Dec 10, 2024 | HallucinationLanguage Modeling | CodeCode Available | 2 |
| LinVT: Empower Your Image-level Large Language Model to Understand Videos | Dec 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| C^2LEVA: Toward Comprehensive and Contamination-Free Language Model Evaluation | Dec 6, 2024 | Language Model EvaluationLanguage Modeling | CodeCode Available | 2 |
| FLAIR: VLM with Fine-grained Language-informed Image Representations | Dec 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs | Dec 2, 2024 | AllLanguage Modeling | CodeCode Available | 2 |
| KV Shifting Attention Enhances Language Modeling | Nov 29, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection | Nov 26, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension | Nov 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Nov 26, 2024 | Language ModelingLarge Language Model | CodeCode Available | 2 |