SOTAVerified

Scene Graph Generation

A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of Scene Graph Generation is to generate a visually-grounded scene graph that most accurately correlates with an image.

Source: Scene Graph Generation by Iterative Message Passing

Papers

Showing 125 of 318 papers

TitleStatusHype
4D Panoptic Scene Graph GenerationCode3
Open World Scene Graph Generation using Vision Language ModelsCode2
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical EnvironmentsCode2
RelationField: Relate Anything in Radiance FieldsCode2
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language UnderstandingCode2
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite ImageryCode2
REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph GenerationCode2
EGTR: Extracting Graph from Transformer for Scene Graph GenerationCode2
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language ModelsCode2
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph GenerationCode2
SGTR+: End-to-end Scene Graph Generation with TransformerCode2
Panoptic Scene Graph GenerationCode2
RelTR: Relation Transformer for Scene Graph GenerationCode2
Unbiased Scene Graph Generation from Biased TrainingCode2
Learning to Compose Dynamic Tree Structures for Visual ContextsCode2
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity UnderstandingCode1
DIFFVSGG: Diffusion-Driven Online Video Scene Graph GenerationCode1
Weakly Supervised Video Scene Graph Generation via Natural Language SupervisionCode1
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype LearningCode1
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language ModelsCode1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
Scene Graph Generation with Role-Playing Large Language ModelsCode1
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph GenerationCode1
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal ModelsCode1
A Fair Ranking and New Model for Panoptic Scene Graph GenerationCode1
Show:102550
← PrevPage 1 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ExpressiveSGGR@10039.12Unverified
2NeuSyRER@10039.1Unverified
3KnowZRelzR@10035.65Unverified
4SpeaQ (without reweighting)Recall@5032.9Unverified
5SpeaQ (with reweighting)Recall@5032.1Unverified
6Causal-TDERecall@5031.93Unverified
7SG-EBMRecall@5031.74Unverified
8GPS-NetRecall@5028.9Unverified
9LOGINRecall@5028.2Unverified
10VCTreeRecall@5027.9Unverified
#ModelMetricClaimedVerifiedStatus
1ORacleF10.91Unverified
2MM2SGF10.9Unverified
3Pix2SGF10.9Unverified
4LABRAD-ORF10.88Unverified
54D-OR baselineF10.75Unverified
#ModelMetricClaimedVerifiedStatus
1SceneGraphFusionTop-5 Accuracy0.87Unverified
23DSSG [Wald2020_3dssg]Top-5 Accuracy0.66Unverified
#ModelMetricClaimedVerifiedStatus
1FactorizableNetRecall@5018.32Unverified
2VRDRecall@5018.16Unverified
#ModelMetricClaimedVerifiedStatus
1KnowZRelzR@10029.56Unverified
#ModelMetricClaimedVerifiedStatus
1MM2SGMacro F10.53Unverified
#ModelMetricClaimedVerifiedStatus
1NeuSyRER@10038.5Unverified