SOTAVerified

Action Anticipation

Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.

Papers

Showing 150 of 110 papers

TitleStatusHype
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and PlanningCode7
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real WorldCode2
EgoVideo: Exploring Egocentric Foundation Model and Downstream AdaptationCode2
Learning State-Aware Visual Representations from Audible InteractionsCode1
Future Transformer for Long-term Action AnticipationCode1
MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action AnticipationCode1
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionCode1
Real-time Online Video Detection with Temporal Smoothing TransformersCode1
Rethinking Learning Approaches for Long-Term Action AnticipationCode1
Pedestrian 3D Bounding Box PredictionCode1
Action Anticipation with Goal ConsistencyCode1
Gated Temporal Diffusion for Stochastic Long-Term Dense AnticipationCode1
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?Code1
Anticipative Feature Fusion Transformer for Multi-Modal Action AnticipationCode1
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality AttentionCode1
Rolling-Unrolling LSTMs for Action Anticipation from First-Person VideoCode1
Technical Report: Temporal Aggregate RepresentationsCode1
Intention-Conditioned Long-Term Human Egocentric Action ForecastingCode1
Multimodal Large Models Are Effective Action AnticipatorsCode1
Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNsCode1
Action Scene Graphs for Long-Form Understanding of Egocentric VideosCode1
Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023Code1
A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic AccidentsCode1
Semantically Guided Representation Learning For Action AnticipationCode1
Anticipative Video TransformerCode1
Temporal Aggregate Representations for Long-Range Video UnderstandingCode1
Rescaling Egocentric VisionCode1
Higher Order Recurrent Space-Time Transformer for Video Action PredictionCode1
Video + CLIP Baseline for Ego4D Long-term Action AnticipationCode1
Video Representation Learning with Visual Tempo ConsistencyCode1
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided AttentionCode0
Encouraging LSTMs to Anticipate Actions Very EarlyCode0
Text-Derived Knowledge Helps Vision: A Simple Cross-modal Distillation for Video-based Action AnticipationCode0
TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021Code0
Technical Report for Ego4D Long Term Action Anticipation Challenge 2023Code0
Interaction Region Visual Transformer for Egocentric Action AnticipationCode0
Scaling Egocentric Vision: The EPIC-KITCHENS DatasetCode0
Unified Recurrence Modeling for Video Action AnticipationCode0
RED: Reinforced Encoder-Decoder Networks for Action AnticipationCode0
Predicting the Next Action by Modeling the Abstract GoalCode0
QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person ViewCode0
Object-centric Video Representation for Long-term Action AnticipationCode0
Hierarchical and Multimodal Data for Daily Activity UnderstandingCode0
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNNCode0
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural ActivitiesCode0
Mamba Fusion: Learning Actions Through QuestioningCode0
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person VideoCode0
Action Anticipation from SoccerNet Football Video BroadcastsCode0
Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction VideosCode0
From Recognition to Prediction: Leveraging Sequence Reasoning for Action AnticipationCode0
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PlausiVLRecall@527.6Unverified
2InAViTRecall@525.89Unverified
3UADTRecall@523Unverified
4S-GEARRecall@519.9Unverified
5AFFTRecall@518.5Unverified
6MeMViT-24Recall@517.7Unverified
7AVT+Recall@515.9Unverified
8TempAggRecall@514.73Unverified
9RU-LSTMRecall@513.94Unverified
#ModelMetricClaimedVerifiedStatus
1InAViTrecall@523.75Unverified
2AVT++recall@516.7Unverified
3AFFTrecall@514.9Unverified
4Abstract Goalrecall@514.29Unverified
5AVT+recall@512.6Unverified
6TempAggrecall@512.6Unverified
7RULSTMrecall@511.2Unverified
8TBNrecall@511Unverified
#ModelMetricClaimedVerifiedStatus
1Abstract GoalTop 1 Accuracy - Act.22.03Unverified
2AVT+Top 1 Accuracy - Act.16.84Unverified
3ImagineRNNTop 1 Accuracy - Act.14.66Unverified
4RULSTM [24, 23]Top 1 Accuracy - Act.14.39Unverified
5EDTop 1 Accuracy - Act.8.08Unverified
6ATSNTop 1 Accuracy - Act.6Unverified
72SCNNTop 1 Accuracy - Act.4.32Unverified
#ModelMetricClaimedVerifiedStatus
1Abstract GoalTop 1 Accuracy - Act.13.28Unverified
2AVT+Top 1 Accuracy - Act.10.41Unverified
3ImagineRNNTop 1 Accuracy - Act.9.25Unverified
4RULSTM [24, 23]Top 1 Accuracy - Act.8.16Unverified
5EDTop 1 Accuracy - Act.2.65Unverified
6ATSNTop 1 Accuracy - Act.2.39Unverified
72SCNNTop 1 Accuracy - Act.2.29Unverified
#ModelMetricClaimedVerifiedStatus
1UADTTop-1 Accuracy68.4Unverified
2InAViTTop-1 Accuracy67.8Unverified
3Abstract GoalTop-1 Accuracy49.8Unverified
#ModelMetricClaimedVerifiedStatus
1Goal ConsistencyVerbs Recall@560.04Unverified
2TempAggVerbs Recall@559.11Unverified
#ModelMetricClaimedVerifiedStatus
1Action anticipation baseline (co-training, with gaze)Accuracy45.45Unverified
2Action anticipation baseline (co-training, no gaze)Accuracy38.7Unverified
#ModelMetricClaimedVerifiedStatus
1UADTTop-1 Accuracy62.7Unverified