RAM

Recurrent models of visual attention

View the Project on GitHub RobRomijnders/RAM

Recurrent model of visual attention

This post implement the Recurrent model of visual attention paper by Mnih, 2014. Like the authors, we refer to this algorithm as Recurrent Attention model (RAM).

RAM

The RAM continues the work in attention models for images. Conventional approaches for image classification scale poorly with image size. Humans do not absorb images in a one-shot fashion. We scan the image and attend to part of interest. The RAM models this attention seeking.

Our previous posts treat attention models too. The DRAW attends to images via Gaussian filters. Another post shows three implementations of attention using feature keys. In the RAM paper, the authors observe that humans make decisions on where to look. Next, the eye focuses on a thumb-sizes patch. All other information gets blurred.

These processing forms the basis for RAM. The fovea-like extracts center around a point that conditions on the state of a network. The hidden state of an LSTM maps to a coordinate vector. The corresponding fovea-like extracts inputs to the LSTM at the next time-step.

Our model of the fovea is non-differentiable. The stochasticity in the coordinate-vector allows us to use the REINFORCE rule. For this Reinforcement agent, the reward follows from a correct prediction after a fixed number of time-steps.

Results

The implementation builds upon this work. Likewise, our visualization resembles his plots. RAM_gif The red squares highlight where the network centers its glimpses. I trained this network on an old laptop for a couple of hours. With more computing power, you might decrease the glimpse-size, allow for more time-steps. Those changes will make the RAM more resemble the human vision system. The corresponding classification achieves 87% accuracy on MNIST

This results makes us wonder if the attention follows the digit or follow from random perturbations. Therefore, we translate the digits in a larger 60x60 image. show_translate

Discussion

Initially, I learned from this implementation. This list summarizes my main changes:

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com