In this post, I'll be continuing on this variational autoencoder (VAE) line of exploration (previous posts: here and here) by writing about how to use variational autoencoders to do semi-supervised learning. In particular, I'll be explaining the technique used in "Semi-supervised Learning with Deep Generative Models" by Kingma et al. I'll be digging into the math (hopefully being more explicit than the paper), giving a bit more background on the variational lower bound, as well as my usual attempt at giving some more intuition. I've also put some notebooks on Github that compare the VAE methods with others such as PCA, CNNs, and pre-trained models. Enjoy!
I wrote a post on the hard parts about machine learning over at Rubikloud:
Here's a blurb:
Much of the buzz around machine learning lately has been around novel applications of deep learning models. They have captured our imagination by anthropomorphizing them, allowing them to dream, play games at superhuman levels, and read x-rays better than physicians. While these deep learning models are incredibly powerful with incredible ingenuity built into them, they are not humans, nor are they much more than “sufficiently large parametric models trained with gradient descent on sufficiently many examples.” In my experience, this is not the hard part about machine learning.
Beyond the flashy headlines, the high-level math, and the computation-heavy calculations, the whole point of machine learning — as has been with computing and software before it — has been its application to real-world outcomes. Invariably, this means dealing with the realities of messy data, generating robust predictions, and automating decisions.
Just as much of the impact of machine learning is beneath the surface, the hard parts of machine learning are not usually sexy. I would argue that the hard parts about machine learning fall into two areas: generating robust predictions and building machine learning systems.
I wrote a post about building a table tennis ranking model over at Rubikloud:
It uses Bradley-Terry probability model to predict the outcome of pair-wise comparisons (e.g. games or matches). I describe an easy algorithm for fitting the model (via MM-algorithms) as well as adding a simple Bayesian prior to handle ill-defined cases. I even have some code on Github so you can build your own ranking system using Google sheets.
Here's a blurb:
Many of our Rubikrew are big fans of table tennis, in fact, we’ve held an annual table tennis tournament for all the employees for three years running (and I’m the reigning champion). It’s an incredibly fun event where everyone in the company gets involved from the tournament participants to the spectators who provide lively play-by-play commentary.
Unfortunately, not everyone gets to participate either due to travel and scheduling issues, or by the fact that they miss the actual tournament period in the case of our interns and co-op students. Another downside is that the event is a single-elimination tournament, so while it has a clear winner the ranking of the participants is not clear.
Being a data scientist, I identified this as a thorny issue for our Rubikrew table tennis players. So, I did what any data scientist would do and I built a model.
In this post, I'm going to share some notes on implementing a variational autoencoder (VAE) on the Street View House Numbers (SVHN) dataset. My last post on variational autoencoders showed a simple example on the MNIST dataset but because it was so simple I thought I might have missed some of the subtler points of VAEs -- boy was I right! The fact that I'm not really a computer vision guy nor a deep learning guy didn't help either. Through this exercise, I picked up some of the basics in the "craft" of computer vision/deep learning area; there are a lot of subtle points that are easy to gloss over if you're just reading someone else's tutorial. I'll share with you some of the details in the math (that I initially got wrong) and also some of the implementation notes along with a notebook that I used to train the VAE. Please check out my previous post on variational autoencoders to get some background.
Update 2017-08-09: I actually found a bug in my original code where I was only using a small subset of the data! I fixed it up in the notebooks and I've added some inline comments below to say what I've changed. For the most part, things have stayed the same but the generated images are a bit blurry because the dataset isn't so easy anymore.
This post is going to talk about an incredibly interesting unsupervised learning method in machine learning called variational autoencoders. It's main claim to fame is in building generative models of complex distributions like handwritten digits, faces, and image segments among others. The really cool thing about this topic is that it has firm roots in probability but uses a function approximator (i.e. neural networks) to approximate an otherwise intractable problem. As usual, I'll try to start with some background and motivation, include a healthy does of math, and along the way try to convey some of the intuition of why it works. I've also annotated a basic example so you can see how the math relates to an actual implementation. I based much of this post on Carl Doersch's tutorial, which has a great explanation on this whole topic, so make sure you check that out too.