| MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 3 |
| FaceGPT: Self-supervised Learning to Chat about 3D Human Faces | Jun 11, 2024 | 3D Face ReconstructionFace Model | —Unverified | 0 |
| Joint Embeddings for Graph Instruction Tuning | May 31, 2024 | Instruction Followingvisual instruction following | —Unverified | 0 |
| Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | May 27, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | May 16, 2024 | Decision MakingInstruction Following | —Unverified | 0 |
| CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | May 9, 2024 | Image CaptioningInstruction Following | CodeCode Available | 2 |
| Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models | Mar 19, 2024 | Instruction Followingvisual instruction following | CodeCode Available | 2 |
| Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels? | Nov 29, 2023 | In-Context LearningInstruction Following | CodeCode Available | 1 |
| ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | Nov 21, 2023 | DescriptiveMME | CodeCode Available | 0 |
| Improved Baselines with Visual Instruction Tuning | Oct 5, 2023 | Factual Inconsistency Detection in Chart CaptioningImage Classification | CodeCode Available | 6 |