Ladder by RobRomijnders

Ladder network

This post implements the Ladder network, proposed in this and this paper for Tensorflow. The ladder network allows for semi-supervised learning. In this setting, the network learns from both labelled and unlabelled data. For example, the ladder network achieves only 0.0106 test error on MNIST using 100 labelled examples (ten labelled examples per class).

Outline

This section aims to outline the network. The paper describes further details. A ladder network combines supervised and unsupervised learning. Supervised learning takes form as feedforward neural nets, convolutional nets and recurrent nets. Unsupervised learning takes form as auto-encoding the distributed representations. Supervised and unsupervised learning contrast in their preservation of detail. Supervised learning serves one task at hand. It discards the details in the input that do not contribute. Unsupervised learning preserves a range of details and concepts in order to reconstruct it at the decoder. Every stage of the ladder combines the two: One objective function monitors the reconstruction of the distributed representation at the decoder side. Another objective (at the topmost level) monitors the classification performance of the encoder network. The reconstruction-objectives require no labels. Yet they involve the entire network. Therefore, unlabelled data can train the network too.

Another way to understand ladder networks is via this analogy: Humans do semi-supervised learning too. Say we'd train your friend to classify oak trees from maple trees. We need maybe five or six examples so he can classify them with reasonable accuracy. Yet we'd need thousands of examples for a computer to achieve similar accuracy. Why is this? Well, he has seen many trees throughout his live. He understands the general structure of a tree. We only need few examples to add the fine details that discriminate a maple from an oak tree. The interplay of encoder and decoder follow this analogy too. The unsupervised/auto-encoding part of the ladder network teaches the network a gross understanding of concepts in the data. It needs to do this to achieve reconstruction. The supervised/encoder part of the ladder network teaches the network the fine-grained details that discriminate one class from the other. In other words the unsupervised objective preserves a range of features and with the few labelled samples the network decides which features support classification Do you like more intuitive explanations, images and videos? I'd recommend Harri Valopola's talk at this symposium.

Code

A quick Google-search brings us two implementations in Tensorflow

By rinuboney
By tarvaina I decided to continue on rinuboney's work. It works top-down in one file and is an easy read. Harri Valpola proposes the ladder network in his talk as a general framework. It applies to any neural network that can encode and decode representations. The repo by rinuboney implements the basic MLP version. This post continues with his code and implements it for convolutional neural networks. (I consider to implement a recurrent ladder network too. Email me at romijndersrob@gmail.com if you'd be interested in that)

Results

The ladder network applies to many neural networks where you'd like semi-supervised learning. The results and visualizations will depend on your application. For the moment, I made it work on MNIST

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com