| CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios | Mar 7, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 2 |
| Android in the Zoo: Chain-of-Action-Thought for GUI Agents | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | Mar 5, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Wukong: Towards a Scaling Law for Large-Scale Recommendation | Mar 4, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training | Feb 28, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 2 |
| TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space | Feb 27, 2024 | Contrastive LearningHallucination | CodeCode Available | 2 |
| CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification | Feb 27, 2024 | ClassificationDiagnostic | CodeCode Available | 2 |
| Subobject-level Image Tokenization | Feb 22, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| PALO: A Polyglot Large Multimodal Model for 5B People | Feb 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning | Feb 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |