Variational Recurrent Auto Encoder

In this post, we discussed the variational auto encoder. This post extends this idea to data with sequential nature in a recurrent network. Auto encoders belong to the family of variational inference. In normal networks we define deterministic functions and backpropagate over the gradients. In variational inference, our network contains stochastic or information layers. Also we regard the cost function from a Bayesian standpoint. All together, this makes a likelihood of our data under the model. This likelihood turns out to be intractable. Therefore, we optimize a bound on this quantity. Hence, variational inference.

Computation graph

An auto encoder always follows a similar structure: an encoder maps data to a dense representation; a decoder reconstructs the data from this representation. In normal auto encoders, the dense representation can be any layer in a neural network. As we constrain the size of this network, by backpropagation we learn a dense representation of the data.

Information layer

In variational auto encoders, the dense representation is also named information layer. This layer follows Information Theory and we reason accordingly. There is two viewpoints to this layer. They boil down to the same math

The information layer is a mapping to latent space The encoder maps the data to this latent space. More specifically, the encoder generates conditional distribution q(z|x) From this conditional distribution, we sample a z for the decoder
The information layer is a noisy connection between encoder and decoder. This article by Dirk Kingma introduces what we call the _reparametrization trick. For a Gaussian prior on z, the information layer is merely a deterministic function f(z) = mu + sigma*eps, where mu and sigma follow from the encoder. Eps is a draw from a unit Gaussian.

The according python code is as follows: with tf.name_scope("Latent_space") as scope: self.eps = tf.random_normal(tf.shape(self.z_mu),0,1,dtype=tf.float32) self.z = self.z_mu + tf.mul(tf.sqrt(tf.exp(z_sig_log_sq)),self.eps) #Z is the vector in latent space

Model

These experiments use an LSTM as recurrent neural network. LSTM can depend on long term information. This is beneficial for VRAE, where the only information arises from the latent space. From the latent space, the model predicts the initial state. Throughout the sequence, the output at every step inputs into the LSTM. This way, the model knows what it just predicted

Results

Fortunately, the model can auto encode the basketbal trajectories with two latent variables. In this post, we worked on three point shots. They're obtained from NBA games, where a tracking system outputs the X,Y and Z coordinates for the ball at 25Hz during the game.

A 2 dimensional latent space allows for visualizations. For example in this image, we color the latent space according to the x coordinate from where the ball is shot. color_x

And in this scatterplot, the y coordinates color the points color_y

The points in this scatterplot corresponds to the means of the latent space for data in the validation set.

Interestingly, the latent space cares most about the x and y coordinate from where the ball is shot. The latent space transfers the x and y coordinate of the startpoint in exactly the horse-shoe shape that the three point line would be.

In another scatterplot, we color the points according to being a hit or miss. color_hitmiss This image shows no obvious clustering that we can reason about. A reason could be as follows. To reconstruct a trajectory, it would be important to know the startpoint. These are NBA games, so you know that shots would all go directly to the basket. Misses are only slightly off from hits and many times they bounce on the rim. In that sense, the latent space wouldn't need to convey information on hit/miss probability as it doesn't lower the reconstruction loss.

You want to find it more? Here are some directions

In this talk, Karol Gregor talks together his work on autoencoders, variational inference and DRAW
In this talk, Karol Gregor talks on DRAW at Nando de Freitas' course in London
In this talk Diederik Kingma talks at ICML on his paper Auto encoding Variational Bayes Extensions of these models
[This[(https://arxiv.org/abs/1607.00148) paper discusses the use of auto encoding principles for anomaly detection
This work extends the unsupervised mechanisms of VAE to the semi-supervised case where some part of the data has labels

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com