SOTAVerified

A 3D-QA task requires models to answer a question when given all the information of a 3D scene. Here, models use the 3D spatial information, such as RGB-D scans or point cloud data. We also require models to specify the 3D-bounding boxes of objects that are related to this question answering. This prevents models from answering questions by relying on the textual priors of the trained questions without examining the scene. However, unlike 3D dense captioning, we do not require models to target one described object for each question. This is because multiple objects can be used to answer certain questions. For example, the question “What color is the chairs around the table?” is related to multiple objects. This question is also answerable as long as the chairs around the unique table in the scene have the same color. In such scenarios, we require models to answer the question addressing multiple 3D-bounding boxes.

Title	Date	Tasks	Status	Hype
ScanQA: 3D Question Answering for Spatial Scene Understanding	Dec 20, 2021	3D Question Answering (3D-QA)Object	CodeCode Available	1
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans	Dec 3, 2020	3D dense captioning3D Object Detection	—Unverified	0

3D Question Answering (3D-QA)

Papers