Taking a small break from some of the heavier math, I thought I'd write a post (aka learn more about) a very popular neural network architecture called Residual Networks aka ResNet. This architecture is being very widely used because it's so simple yet so powerful at the same time. The architecture's performance is due its ability to add hundreds of layers (talk about deep learning!) without degrading performance or adding difficulty to training. I really like these types of robust advances where it doesn't require fiddling with all sorts of hyper-parameters to make it work. Anyways, I'll introduce the idea and show an implementation of ResNet on a few runs of a variational autoencoder that I put together on the CIFAR10 dataset.
In this post, I'm going to be describing a really cool idea about how to improve variational autoencoders using inverse autoregressive flows. The main idea is that we can generate more powerful posterior distributions compared to a more basic isotropic Gaussian by applying a series of invertible transformations. This, in theory, will allow your variational autoencoder to fit better by concentrating the stochastic samples around a closer approximation to the true posterior. The math works out so nicely while the results are kind of marginal 1. As usual, I'll go through some intuition, some math, and have an implementation with few experiments I ran. Enjoy!
You might think that I'd be bored with autoencoders by now but I still find them extremely interesting! In this post, I'm going to be explaining a cute little idea that I came across in the paper MADE: Masked Autoencoder for Distribution Estimation. Traditional autoencoders are great because they can perform unsupervised learning by mapping an input to a latent representation. However, one drawback is that they don't have a solid probabilistic basis (of course there are other variants of autoencoders that do, see previous posts here, here, and here). By using what the authors define as the autoregressive property, we can transform the traditional autoencoder approach into a fully probabilistic model with very little modification! As usual, I'll provide some intuition, math and an implementation.
In this post, I'll be continuing on this variational autoencoder (VAE) line of exploration (previous posts: here and here) by writing about how to use variational autoencoders to do semi-supervised learning. In particular, I'll be explaining the technique used in "Semi-supervised Learning with Deep Generative Models" by Kingma et al. I'll be digging into the math (hopefully being more explicit than the paper), giving a bit more background on the variational lower bound, as well as my usual attempt at giving some more intuition. I've also put some notebooks on Github that compare the VAE methods with others such as PCA, CNNs, and pre-trained models. Enjoy!
I wrote a post on the hard parts about machine learning over at Rubikloud:
Here's a blurb:
Much of the buzz around machine learning lately has been around novel applications of deep learning models. They have captured our imagination by anthropomorphizing them, allowing them to dream, play games at superhuman levels, and read x-rays better than physicians. While these deep learning models are incredibly powerful with incredible ingenuity built into them, they are not humans, nor are they much more than “sufficiently large parametric models trained with gradient descent on sufficiently many examples.” In my experience, this is not the hard part about machine learning.
Beyond the flashy headlines, the high-level math, and the computation-heavy calculations, the whole point of machine learning — as has been with computing and software before it — has been its application to real-world outcomes. Invariably, this means dealing with the realities of messy data, generating robust predictions, and automating decisions.
Just as much of the impact of machine learning is beneath the surface, the hard parts of machine learning are not usually sexy. I would argue that the hard parts about machine learning fall into two areas: generating robust predictions and building machine learning systems.