SOTAVerified

Active Speaker Localization

Active Speaker Localization (ASL) is the process of spatially localizing an active speaker (talker) in an environment using either audio, vision or both.

Papers

Showing 15 of 5 papers

TitleStatusHype
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception0
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos0
Audio visual character profiles for detecting background characters in entertainment media0
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization0
Cross modal video representations for weakly supervised active speaker localization0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1AV (cor+eng+box)ASL mAP0.86Unverified