Ladder

LADDER network after Harri Valpola

View the Project on GitHub RobRomijnders/ladder

Ladder network

This post implements the Ladder network, proposed in this and this paper for Tensorflow. The ladder network allows for semi-supervised learning. In this setting, the network learns from both labelled and unlabelled data. For example, the ladder network achieves only 0.0106 test error on MNIST using 100 labelled examples (ten labelled examples per class).

Outline

This section aims to outline the network. The paper describes further details. A ladder network combines supervised and unsupervised learning. Supervised learning takes form as feedforward neural nets, convolutional nets and recurrent nets. Unsupervised learning takes form as auto-encoding the distributed representations. Supervised and unsupervised learning contrast in their preservation of detail. Supervised learning serves one task at hand. It discards the details in the input that do not contribute. Unsupervised learning preserves a range of details and concepts in order to reconstruct it at the decoder. Every stage of the ladder combines the two: One objective function monitors the reconstruction of the distributed representation at the decoder side. Another objective (at the topmost level) monitors the classification performance of the encoder network. The reconstruction-objectives require no labels. Yet they involve the entire network. Therefore, unlabelled data can train the network too.

Another way to understand ladder networks is via this analogy: Humans do semi-supervised learning too. Say we'd train your friend to classify oak trees from maple trees. We need maybe five or six examples so he can classify them with reasonable accuracy. Yet we'd need thousands of examples for a computer to achieve similar accuracy. Why is this? Well, he has seen many trees throughout his live. He understands the general structure of a tree. We only need few examples to add the fine details that discriminate a maple from an oak tree. The interplay of encoder and decoder follow this analogy too. The unsupervised/auto-encoding part of the ladder network teaches the network a gross understanding of concepts in the data. It needs to do this to achieve reconstruction. The supervised/encoder part of the ladder network teaches the network the fine-grained details that discriminate one class from the other. In other words the unsupervised objective preserves a range of features and with the few labelled samples the network decides which features support classification Do you like more intuitive explanations, images and videos? I'd recommend Harri Valopola's talk at this symposium.

Code

A quick Google-search brings us two implementations in Tensorflow

Results

The ladder network applies to many neural networks where you'd like semi-supervised learning. The results and visualizations will depend on your application. For the moment, I made it work on MNIST 18 19 21

As always, I am curious to any comments and questions. Reach me at romijndersrob@gmail.com