| Sylber: Syllabic Embedding Representation of Speech from Raw Audio | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Towards Interpreting Visual Information Processing in Vision-Language Models | Oct 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation | Oct 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Think While You Generate: Discrete Diffusion with Planned Denoising | Oct 8, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling | Oct 8, 2024 | document understandingLanguage Modeling | CodeCode Available | 2 |
| Differential Transformer | Oct 7, 2024 | HallucinationIn-Context Learning | CodeCode Available | 2 |
| Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Oct 7, 2024 | Causal Inferencecounterfactual | CodeCode Available | 2 |
| TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens | Oct 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GenSim: A General Social Simulation Platform with Large Language Model based Agents | Oct 6, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SyllableLM: Learning Coarse Semantic Units for Speech Language Models | Oct 5, 2024 | ClusteringLanguage Modeling | CodeCode Available | 2 |