SOTAVerified

Scene Graph Generation

A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of Scene Graph Generation is to generate a visually-grounded scene graph that most accurately correlates with an image.

Source: Scene Graph Generation by Iterative Message Passing

Papers

Showing 150 of 318 papers

TitleStatusHype
4D Panoptic Scene Graph GenerationCode3
Open World Scene Graph Generation using Vision Language ModelsCode2
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical EnvironmentsCode2
RelationField: Relate Anything in Radiance FieldsCode2
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language UnderstandingCode2
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite ImageryCode2
REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph GenerationCode2
EGTR: Extracting Graph from Transformer for Scene Graph GenerationCode2
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language ModelsCode2
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph GenerationCode2
SGTR+: End-to-end Scene Graph Generation with TransformerCode2
Panoptic Scene Graph GenerationCode2
RelTR: Relation Transformer for Scene Graph GenerationCode2
Unbiased Scene Graph Generation from Biased TrainingCode2
Learning to Compose Dynamic Tree Structures for Visual ContextsCode2
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity UnderstandingCode1
DIFFVSGG: Diffusion-Driven Online Video Scene Graph GenerationCode1
Weakly Supervised Video Scene Graph Generation via Natural Language SupervisionCode1
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype LearningCode1
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language ModelsCode1
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial RelationsCode1
Scene Graph Generation with Role-Playing Large Language ModelsCode1
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph GenerationCode1
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal ModelsCode1
A Fair Ranking and New Model for Panoptic Scene Graph GenerationCode1
Leveraging Predicate and Triplet Learning for Scene Graph GenerationCode1
OED: Towards One-stage End-to-End Dynamic Scene Graph GenerationCode1
A Review and Efficient Implementation of Scene Graph Generation MetricsCode1
ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain ModelingCode1
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports VideosCode1
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship DetectionCode1
Towards Scene Graph AnticipationCode1
Adaptive Self-training Framework for Fine-grained Scene Graph GenerationCode1
Panoptic Video Scene Graph GenerationCode1
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph GenerationCode1
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense KnowledgeCode1
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and RetentionCode1
NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph EnrichmentCode1
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph GenerationCode1
Less is More: Toward Zero-Shot Local Scene Graph Generation via Foundation ModelsCode1
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph GenerationCode1
Zero-Shot Scene Graph Generation via Triplet Calibration and ReductionCode1
Vision Relation Transformer for Unbiased Scene Graph GenerationCode1
RLIPv2: Fast Scaling of Relational Language-Image Pre-trainingCode1
Compositional Feature Augmentation for Unbiased Scene Graph GenerationCode1
Panoptic Scene Graph Generation with Semantics-Prototype LearningCode1
Pair then Relation: Pair-Net for Panoptic Scene Graph GenerationCode1
Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker DetectionCode1
Unbiased Scene Graph Generation in VideosCode1
SPAN: Learning Similarity between Scene Graphs and Images with TransformersCode1
Show:102550
← PrevPage 1 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ExpressiveSGGR@10039.12Unverified
2NeuSyRER@10039.1Unverified
3KnowZRelzR@10035.65Unverified
4SpeaQ (without reweighting)Recall@5032.9Unverified
5SpeaQ (with reweighting)Recall@5032.1Unverified
6Causal-TDERecall@5031.93Unverified
7SG-EBMRecall@5031.74Unverified
8GPS-NetRecall@5028.9Unverified
9LOGINRecall@5028.2Unverified
10VCTreeRecall@5027.9Unverified
#ModelMetricClaimedVerifiedStatus
1ORacleF10.91Unverified
2MM2SGF10.9Unverified
3Pix2SGF10.9Unverified
4LABRAD-ORF10.88Unverified
54D-OR baselineF10.75Unverified
#ModelMetricClaimedVerifiedStatus
1SceneGraphFusionTop-5 Accuracy0.87Unverified
23DSSG [Wald2020_3dssg]Top-5 Accuracy0.66Unverified
#ModelMetricClaimedVerifiedStatus
1FactorizableNetRecall@5018.32Unverified
2VRDRecall@5018.16Unverified
#ModelMetricClaimedVerifiedStatus
1KnowZRelzR@10029.56Unverified
#ModelMetricClaimedVerifiedStatus
1MM2SGMacro F10.53Unverified
#ModelMetricClaimedVerifiedStatus
1NeuSyRER@10038.5Unverified