SOTAVerified

audio-visual event localization

Papers

Showing 1120 of 26 papers

TitleStatusHype
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization0
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event ParserCode1
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and BaselineCode1
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization0
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event LocalizationCode0
Past and Future Motion Guided Network for Audio Visual Event Localization0
ActionFormer: Localizing Moments of Actions with TransformersCode2
Cross-Modal Background Suppression for Audio-Visual Event LocalizationCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
Multi-Modulation Network for Audio-Visual Event Localization0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1UnAV mAP47.8Unverified
2ActionFormer mAP42.2Unverified