Recurrent Models of Visual Attention
Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/abdosharaf98/active-sensing-paperpytorch★ 10
- github.com/MiuGod0126/RAM-Paddlepaddle★ 0
- github.com/ulstu/robotics_mlnone★ 0
- github.com/amobiny/Recurrent_Attention_Modeltf★ 0
- github.com/conan7882/recurrent-attention-modeltf★ 0
- github.com/Sooram/cnntf★ 0
- github.com/kevinzakka/recurrent-visual-attentionpytorch★ 0
- github.com/johnrobinsn/catchpytorch★ 0
- github.com/mengdi-li/robotic-occlusion-reasoningpytorch★ 0
- github.com/tianyu-tristan/Visual-Attention-Modeltf★ 0
Abstract
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.