SOTAVerified

Live video captioning (LVC) involves detecting and describing dense events within video streams. Traditional dense video captioning approaches typically focus on offline solutions where the entire video is available for analysis by the captioning model. In contrast, the LVC paradigm requires models to generate captions for video streams in an online manner. This imposes significant constraints, such as working with incomplete observations of the video and the need for temporal anticipation.

Live Video Captioning

Papers