| Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning | Sep 30, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 |
| LLMEmb: Large Language Model Can Be a Good Embedding Generator for Sequential Recommendation | Sep 30, 2024 | AttributeCollaborative Filtering | CodeCode Available | 2 |
| VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Sep 30, 2024 | Anomaly DetectionLanguage Modeling | —Unverified | 0 |
| Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval | Sep 30, 2024 | Cross-Modal RetrievalLarge Language Model | CodeCode Available | 0 |
| VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Sep 30, 2024 | EgoSchemaLanguage Modelling | CodeCode Available | 1 |
| MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | Sep 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| A multimodal LLM for the non-invasive decoding of spoken text from brain recordings | Sep 29, 2024 | Large Language Model | CodeCode Available | 0 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning | Sep 29, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |