SOTAVerified

3D Question Answering (3D-QA)

A 3D-QA task requires models to answer a question when given all the information of a 3D scene. Here, models use the 3D spatial information, such as RGB-D scans or point cloud data. We also require models to specify the 3D-bounding boxes of objects that are related to this question answering. This prevents models from answering questions by relying on the textual priors of the trained questions without examining the scene. However, unlike 3D dense captioning, we do not require models to target one described object for each question. This is because multiple objects can be used to answer certain questions. For example, the question “What color is the chairs around the table?” is related to multiple objects. This question is also answerable as long as the chairs around the unique table in the scene have the same color. In such scenarios, we require models to answer the question addressing multiple 3D-bounding boxes.

Title	Date	Tasks	Status	Hype
Visual Instruction Tuning	Apr 17, 2023	1 Image, 2*2 Stitching3D Question Answering (3D-QA)	CodeCode Available	6
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3
3D-LLM: Injecting the 3D World into Large Language Models	Jul 24, 2023	3D Object Captioning3D Question Answering (3D-QA)	CodeCode Available	3
An Embodied Generalist Agent in 3D World	Nov 18, 2023	3D dense captioning3D Question Answering (3D-QA)	CodeCode Available	2
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers	Dec 13, 2023	3D Question Answering (3D-QA)Attribute	CodeCode Available	2
PointLLM: Empowering Large Language Models to Understand Point Clouds	Aug 31, 2023	3D Object Captioning3D Object Classification	CodeCode Available	2
Multi-modal Situated Reasoning in 3D Scenes	Sep 4, 2024	3D Question Answering (3D-QA)	CodeCode Available	2
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	Aug 8, 2023	3D Question Answering (3D-QA)Dense Captioning	CodeCode Available	2
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following	Sep 1, 2023	3D Generation3D Question Answering (3D-QA)	CodeCode Available	2

Title

Status

Hype

Visual Instruction Tuning

CodeCode Available

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

CodeCode Available

3D-LLM: Injecting the 3D World into Large Language Models

CodeCode Available

An Embodied Generalist Agent in 3D World

CodeCode Available

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers