VAE rec

Variational Recurrent Auto Encoder

View the Project on GitHub RobRomijnders/VAE_rec

Variational Recurrent Auto Encoder

In this post, we discussed the variational auto encoder. This post extends this idea to data with sequential nature in a recurrent network. Auto encoders belong to the family of variational inference. In normal networks we define deterministic functions and backpropagate over the gradients. In variational inference, our network contains stochastic or information layers. Also we regard the cost function from a Bayesian standpoint. All together, this makes a likelihood of our data under the model. This likelihood turns out to be intractable. Therefore, we optimize a bound on this quantity. Hence, variational inference.

Computation graph

An auto encoder always follows a similar structure: an encoder maps data to a dense representation; a decoder reconstructs the data from this representation. In normal auto encoders, the dense representation can be any layer in a neural network. As we constrain the size of this network, by backpropagation we learn a dense representation of the data.

Information layer

In variational auto encoders, the dense representation is also named information layer. This layer follows Information Theory and we reason accordingly. There is two viewpoints to this layer. They boil down to the same math

The according python code is as follows: with tf.name_scope("Latent_space") as scope: self.eps = tf.random_normal(tf.shape(self.z_mu),0,1,dtype=tf.float32) self.z = self.z_mu + tf.mul(tf.sqrt(tf.exp(z_sig_log_sq)),self.eps) #Z is the vector in latent space

Model

These experiments use an LSTM as recurrent neural network. LSTM can depend on long term information. This is beneficial for VRAE, where the only information arises from the latent space. From the latent space, the model predicts the initial state. Throughout the sequence, the output at every step inputs into the LSTM. This way, the model knows what it just predicted

Results

Fortunately, the model can auto encode the basketbal trajectories with two latent variables. In this post, we worked on three point shots. They're obtained from NBA games, where a tracking system outputs the X,Y and Z coordinates for the ball at 25Hz during the game.

A 2 dimensional latent space allows for visualizations. For example in this image, we color the latent space according to the x coordinate from where the ball is shot. color_x

And in this scatterplot, the y coordinates color the points color_y

The points in this scatterplot corresponds to the means of the latent space for data in the validation set.

Interestingly, the latent space cares most about the x and y coordinate from where the ball is shot. The latent space transfers the x and y coordinate of the startpoint in exactly the horse-shoe shape that the three point line would be.

In another scatterplot, we color the points according to being a hit or miss. color_hitmiss This image shows no obvious clustering that we can reason about. A reason could be as follows. To reconstruct a trajectory, it would be important to know the startpoint. These are NBA games, so you know that shots would all go directly to the basket. Misses are only slightly off from hits and many times they bounce on the rim. In that sense, the latent space wouldn't need to convey information on hit/miss probability as it doesn't lower the reconstruction loss.

You want to find it more? Here are some directions

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com