SOTAVerified

3D Question Answering (3D-QA)

A 3D-QA task requires models to answer a question when given all the information of a 3D scene. Here, models use the 3D spatial information, such as RGB-D scans or point cloud data. We also require models to specify the 3D-bounding boxes of objects that are related to this question answering. This prevents models from answering questions by relying on the textual priors of the trained questions without examining the scene. However, unlike 3D dense captioning, we do not require models to target one described object for each question. This is because multiple objects can be used to answer certain questions. For example, the question “What color is the chairs around the table?” is related to multiple objects. This question is also answerable as long as the chairs around the unique table in the scene have the same color. In such scenarios, we require models to answer the question addressing multiple 3D-bounding boxes.

Title	Date	Tasks	Status	Hype
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis	Mar 28, 2025	3D Question Answering (3D-QA)3D visual grounding	CodeCode Available	1
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering	Mar 5, 2025	3D Question Answering (3D-QA)Question Answering	CodeCode Available	1
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	Nov 30, 2024	3D Question Answering (3D-QA)Position	CodeCode Available	0
Video Instruction Tuning With Synthetic Data	Oct 3, 2024	3D Question Answering (3D-QA)	—Unverified	0
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness	Sep 26, 2024	3D Question Answering (3D-QA)Position	—Unverified	0
Multi-modal Situated Reasoning in 3D Scenes	Sep 4, 2024	3D Question Answering (3D-QA)	CodeCode Available	2
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	0
Unifying 3D Vision-Language Understanding via Promptable Queries	May 19, 2024	3D Question Answering (3D-QA)Decoder	—Unverified	0
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning	Mar 18, 2024	3D Question Answering (3D-QA)Dense Captioning	—Unverified	0
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Feb 27, 2024	3D geometry3D Object Captioning	CodeCode Available	3

Title

Status

Hype

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

CodeCode Available

DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering

CodeCode Available

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

CodeCode Available

Video Instruction Tuning With Synthetic Data

—Unverified

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness