SOTAVerified

visual instruction following

Papers

Showing 124 of 24 papers

TitleStatusHype
Visual Instruction TuningCode6
Improved Baselines with Visual Instruction TuningCode6
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM FinetuningCode3
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual GroundingCode2
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningCode2
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language ModelsCode2
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-ExpertsCode2
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific UnderstandingCode2
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?Code1
Joint Embeddings for Graph Instruction Tuning0
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models0
FaceGPT: Self-supervised Learning to Chat about 3D Human Faces0
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning0
Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications0
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition0
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation0
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification0
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags0
Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation0
ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsCode0
MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus InfectionCode0
Instruction Clarification Requests in Multimodal Collaborative Dialogue Games: Tasks, and an Analysis of the CoDraw DatasetCode0
Show:102550

No leaderboard results yet.