| LongProLIP: A Probabilistic Vision-Language Model with Long Context Text | Mar 11, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Mellow: a small audio language model for reasoning | Mar 11, 2025 | Audio captioningLanguage Modeling | CodeCode Available | 2 |
| When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning | Mar 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| DiffCLIP: Differential Attention Meets CLIP | Mar 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | Mar 7, 2025 | Information RetrievalLanguage Modeling | CodeCode Available | 2 |
| PromptPex: Automatic Test Generation for Language Model Prompts | Mar 7, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Generalized Interpolating Discrete Diffusion | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM | Mar 6, 2025 | Anomaly DetectionLanguage Modeling | CodeCode Available | 2 |
| Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |