Following up on the math-y stuff from my last post, I'm going to be taking a look at another concept that pops up in ML: manifolds. It is most well-known in ML for its use in the manifold hypothesis. Manifolds belong to the branches of mathematics of topology and differential geometry. I'll be focusing more on the study of manifolds from the latter category, which fortunately is a bit less abstract, more well behaved, and more intuitive than the former. As usual, I'll go through some intuition, definitions, and examples to help clarify the ideas without going into too much depth or formalities. I hope you mani-like it!
This post is going to take a step back from some of the machine learning topics that I've been writing about recently and go back to some basics: math! In particular, tensors. This is a topic that is casually mentioned in machine learning papers but for those of us who weren't physics or math majors (*cough* computer engineers), it's a bit murky trying to understand what's going on. So on my most recent vacation, I started reading a variety of sources on the interweb trying to piece together a picture of what tensors were all about. As usual, I'll skip the heavy formalities (partly because I probably couldn't do them justice) and instead try to explain the intuition using my usual approach of examples and more basic maths. I'll sprinkle in a bunch of examples and also try to relate it back to ML where possible. Hope you like it!
Taking a small break from some of the heavier math, I thought I'd write a post (aka learn more about) a very popular neural network architecture called Residual Networks aka ResNet. This architecture is being very widely used because it's so simple yet so powerful at the same time. The architecture's performance is due its ability to add hundreds of layers (talk about deep learning!) without degrading performance or adding difficulty to training. I really like these types of robust advances where it doesn't require fiddling with all sorts of hyper-parameters to make it work. Anyways, I'll introduce the idea and show an implementation of ResNet on a few runs of a variational autoencoder that I put together on the CIFAR10 dataset.
In this post, I'm going to be describing a really cool idea about how to improve variational autoencoders using inverse autoregressive flows. The main idea is that we can generate more powerful posterior distributions compared to a more basic isotropic Gaussian by applying a series of invertible transformations. This, in theory, will allow your variational autoencoder to fit better by concentrating the stochastic samples around a closer approximation to the true posterior. The math works out so nicely while the results are kind of marginal . As usual, I'll go through some intuition, some math, and have an implementation with few experiments I ran. Enjoy!
You might think that I'd be bored with autoencoders by now but I still find them extremely interesting! In this post, I'm going to be explaining a cute little idea that I came across in the paper MADE: Masked Autoencoder for Distribution Estimation. Traditional autoencoders are great because they can perform unsupervised learning by mapping an input to a latent representation. However, one drawback is that they don't have a solid probabilistic basis (of course there are other variants of autoencoders that do, see previous posts here, here, and here). By using what the authors define as the autoregressive property, we can transform the traditional autoencoder approach into a fully probabilistic model with very little modification! As usual, I'll provide some intuition, math and an implementation.