SOTAVerified

Grounded Situation Recognition

Grounded Situation Recognition aims to produce the structured image summary which describes the primary activity (verb), its relevant entities (nouns), and their bounding-box groundings.

Papers

Showing 115 of 15 papers

TitleStatusHype
Dynamic Scene Understanding from Vision-Language Representations0
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer0
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual ImpairmentsCode1
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation RecognitionCode0
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention RefinementCode0
Collaborative Transformers for Grounded Situation RecognitionCode1
Rethinking the Two-Stage Framework for Grounded Situation RecognitionCode1
Grounded Situation Recognition with TransformersCode1
Attention-Based Context Aware Reasoning for Situation RecognitionCode1
Grounded Situation RecognitionCode1
Mixture-Kernel Graph Attention Network for Situation Recognition0
Situation Recognition with Graph Neural NetworksCode1
Recurrent Models for Situation Recognition0
Commonly Uncommon: Semantic Sparsity in Situation RecognitionCode0
Situation Recognition: Visual Semantic Role Labeling for Image UnderstandingCode0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Ours (CoFormer+)Top-1 Verb58.88Unverified
2ClipSituTop-1 Verb58.19Unverified
3CoFormerTop-1 Verb44.66Unverified
4SituFormerTop-1 Verb44.2Unverified
5Kernel GraphNetTop-1 Verb43.27Unverified
6GSRTRTop-1 Verb40.63Unverified
7JSLTop-1 Verb39.94Unverified
8ISLTop-1 Verb39.36Unverified
9CAQ + RE-VGGTop-1 Verb38.19Unverified
10GraphNetTop-1 Verb36.72Unverified