Bounded Rationalityhttp://bjlkeng.github.io/Understanding math, machine learning, and data to a satisfactory degree.enSat, 16 Feb 2019 13:11:28 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rssImportance Sampling and Estimating Marginal Likelihood in Variational Autoencodershttp://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/Brian Keng<div><p>It took a while but I'm back! This post is kind of a digression (which seems
to happen a lot) along my journey of learning more about probabilistic
generative models. There's so much in ML that you can't help learning a lot
of random things along the way. That's why it's interesting, right?</p>
<p>Today's topic is <em>importance sampling</em>. It's a really old idea that you may
have learned in a statistics class (I didn't) but somehow is useful in deep learning,
what's old is new right? How this is relevant to the discussion is that when
we have a large latent variable model (e.g. a variational
autoencoder), we want to be able to efficiently estimate the marginal likelihood
given data. The marginal likelihood is kind of taken for granted in the
experiments of some VAE papers when comparing different models. I was curious
how it was actually computed and it took me down this rabbit hole. Turns out
it's actually pretty interesting! As usual, I'll have a mix of background
material, examples, math and code to build some intuition around this topic.
Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/">Read more…</a> (22 min remaining to read)</p></div>autoencodersautoregressiveCIFAR10generative modelsimportance samplingmathjaxMNISTMonte Carlovariational calculushttp://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/Wed, 06 Feb 2019 12:20:11 GMTLabel Refinery: A Softer Approachhttp://bjlkeng.github.io/posts/label-refinery/Brian Keng<div><p>This post is going to be about a really simple idea that is surprisingly effective
from a paper by Bagherinezhad et al. called <a class="reference external" href="https://arxiv.org/abs/1805.02641">Label Refinery: Improving ImageNet
Classification through Label Progression</a>.
The title pretty much says it all but I'll also discuss some intuition and show
some experiments on the CIFAR10 and SVHN datasets. The idea is both simple and
surprising, my favourite kind of idea! Let's take a look.</p>
<p><a href="http://bjlkeng.github.io/posts/label-refinery/">Read more…</a> (10 min remaining to read)</p></div>CIFAR10label refinerymathjaxresidual networkssvhnhttp://bjlkeng.github.io/posts/label-refinery/Tue, 04 Sep 2018 11:26:02 GMTUniversal ResNet: The One-Neuron Approximatorhttp://bjlkeng.github.io/posts/universal-resnet-the-one-neuron-approximator/Brian Keng<div><p><em>"In theory, theory and practice are the same. In practice, they are not."</em></p>
<p>I read a very interesting paper titled <em>ResNet with one-neuron hidden layers is
a Universal Approximator</em> by Lin and Jegelka [1].
The paper describes a simplified Residual Network as a universal approximator,
giving some theoretical backing to the wildly successful ResNet architecture.
In this post, I'm going to talk about this paper and a few of the related
universal approximation theorems for neural networks.
Instead of going through all the theoretical stuff, I'm simply going introduce
some theorems and play around with some toy datasets to see if we can get close
to the theoretical limits.</p>
<p>(You might also want to checkout my previous post where I played around with
ResNets: <a class="reference external" href="http://bjlkeng.github.io/posts/residual-networks">Residual Networks</a>)</p>
<p><a href="http://bjlkeng.github.io/posts/universal-resnet-the-one-neuron-approximator/">Read more…</a> (11 min remaining to read)</p></div>hidden layersmathjaxneural networksresidual networksResNetuniversal approximatorhttp://bjlkeng.github.io/posts/universal-resnet-the-one-neuron-approximator/Fri, 03 Aug 2018 12:03:28 GMTHyperbolic Geometry and Poincaré Embeddingshttp://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings/Brian Keng<div><p>This post is finally going to get back to some ML related topics.
In fact, the original reason I took that whole math-y detour in the previous
posts was to more deeply understand this topic. It turns out trying to
under tensor calculus and differential geometry (even to a basic level) takes a
while! Who knew? In any case, we're getting back to our regularly scheduled program.</p>
<p>In this post, I'm going to explain one of the applications of an abstract
area of mathematics called hyperbolic geometry. The reason why this area is of
interest is because there has been a surge of research showing its
application in various fields, chief among them is a paper by Facebook
researchers [1] in which they discuss how to utilize a model of hyperbolic
geometry to represent hierarchical relationships. I'll cover some of
the math weighting more towards intuition, show some of their results, and also
show some sample code from Gensim. Don't worry, this time I'll try much harder
not going to go down the rabbit hole of trying to explain all the math (no
promises though).</p>
<p>(Note: If you're unfamiliar with tensors or manifolds, I suggest getting a quick
overview with my previous two posts:
<a class="reference external" href="http://bjlkeng.github.io/posts/tensors-tensors-tensors">Tensors, Tensors, Tensors</a> and
<a class="reference external" href="http://bjlkeng.github.io/posts/manifolds">Manifolds: A Gentle Introduction</a>)</p>
<p><a href="http://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings/">Read more…</a> (34 min remaining to read)</p></div>embeddingsgeometryhyperbolicmanifoldsmathjaxPoincaréhttp://bjlkeng.github.io/posts/hyperbolic-geometry-and-poincare-embeddings/Sun, 17 Jun 2018 12:20:18 GMTManifolds: A Gentle Introductionhttp://bjlkeng.github.io/posts/manifolds/Brian Keng<div><p>Following up on the math-y stuff from my <a class="reference external" href="http://bjlkeng.github.io/posts/tensors-tensors-tensors">last post</a>,
I'm going to be taking a look at another concept that pops up in ML: manifolds.
It is most well-known in ML for its use in the
<a class="reference external" href="https://www.quora.com/What-is-the-Manifold-Hypothesis-in-Deep-Learning">manifold hypothesis</a>.
Manifolds belong to the branches of mathematics of topology and differential
geometry. I'll be focusing more on the study of manifolds from the latter
category, which fortunately is a bit less abstract, more well behaved, and more
intuitive than the former. As usual, I'll go through some intuition,
definitions, and examples to help clarify the ideas without going into too much
depth or formalities. I hope you mani-like it!</p>
<p><a href="http://bjlkeng.github.io/posts/manifolds/">Read more…</a> (30 min remaining to read)</p></div>manifoldsmathjaxmetric tensorhttp://bjlkeng.github.io/posts/manifolds/Tue, 17 Apr 2018 11:24:57 GMTTensors, Tensors, Tensorshttp://bjlkeng.github.io/posts/tensors-tensors-tensors/Brian Keng<div><p>This post is going to take a step back from some of the machine learning
topics that I've been writing about recently and go back to some basics: math!
In particular, tensors. This is a topic that is casually mentioned in machine
learning papers but for those of us who weren't physics or math majors
(*cough* computer engineers), it's a bit murky trying to understand what's going on.
So on my most recent vacation, I started reading a variety of sources on the
interweb trying to piece together a picture of what tensors were all
about. As usual, I'll skip the heavy formalities (partly because I probably
couldn't do them justice) and instead try to explain the intuition using my
usual approach of examples and more basic maths. I'll sprinkle in a bunch of
examples and also try to relate it back to ML where possible. Hope you like
it!</p>
<p><a href="http://bjlkeng.github.io/posts/tensors-tensors-tensors/">Read more…</a> (23 min remaining to read)</p></div>bilinearcontravariancecovariancecovectorsgeometric vectorslinear transformationsmathjaxmetric tensortensorshttp://bjlkeng.github.io/posts/tensors-tensors-tensors/Tue, 13 Mar 2018 13:24:57 GMTResidual Networkshttp://bjlkeng.github.io/posts/residual-networks/Brian Keng<div><p>Taking a small break from some of the heavier math, I thought I'd write a post
(aka learn more about) a very popular neural network architecture called
Residual Networks aka ResNet. This architecture is being very widely used
because it's so simple yet so powerful at the same time. The architecture's
performance is due its ability to add hundreds of layers (talk about deep
learning!) without degrading performance or adding difficulty to training. I
really like these types of robust advances where it doesn't require fiddling
with all sorts of hyper-parameters to make it work. Anyways, I'll introduce
the idea and show an implementation of ResNet on a few runs of a variational
autoencoder that I put together on the CIFAR10 dataset.</p>
<p><a href="http://bjlkeng.github.io/posts/residual-networks/">Read more…</a> (9 min remaining to read)</p></div>autoencodersCIFAR10mathjaxresidual networksResNethttp://bjlkeng.github.io/posts/residual-networks/Sun, 18 Feb 2018 18:55:13 GMTVariational Autoencoders with Inverse Autoregressive Flowshttp://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/Brian Keng<div><p>In this post, I'm going to be describing a really cool idea about how
to improve variational autoencoders using inverse autoregressive
flows. The main idea is that we can generate more powerful posterior
distributions compared to a more basic isotropic Gaussian by applying a
series of invertible transformations. This, in theory, will allow
your variational autoencoder to fit better by concentrating the
stochastic samples around a closer approximation to the true
posterior. The math works out so nicely while the results are kind of
marginal <a class="footnote-reference" href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/#id3" id="id1">[1]</a>. As usual, I'll go through some intuition, some math,
and have an implementation with few experiments I ran. Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/">Read more…</a> (18 min remaining to read)</p></div>autoencodersautoregressiveCIFAR10generative modelsKullback-LeiblerMADEmathjaxMNISTvariational calculushttp://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/Tue, 19 Dec 2017 13:47:38 GMTAutoregressive Autoencodershttp://bjlkeng.github.io/posts/autoregressive-autoencoders/Brian Keng<div><p>You might think that I'd be bored with autoencoders by now but I still
find them extremely interesting! In this post, I'm going to be explaining
a cute little idea that I came across in the paper <a class="reference external" href="https://arxiv.org/pdf/1502.03509.pdf">MADE: Masked Autoencoder
for Distribution Estimation</a>.
Traditional autoencoders are great because they can perform unsupervised
learning by mapping an input to a latent representation. However, one
drawback is that they don't have a solid probabilistic basis
(of course there are other variants of autoencoders that do, see previous posts
<a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders">here</a>,
<a class="reference external" href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset">here</a>, and
<a class="reference external" href="http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders">here</a>).
By using what the authors define as the <em>autoregressive property</em>, we can
transform the traditional autoencoder approach into a fully probabilistic model
with very little modification! As usual, I'll provide some intuition, math and
an implementation.</p>
<p><a href="http://bjlkeng.github.io/posts/autoregressive-autoencoders/">Read more…</a> (17 min remaining to read)</p></div>autoencodersautoregressivegenerative modelsMADEmathjaxMNISThttp://bjlkeng.github.io/posts/autoregressive-autoencoders/Sat, 14 Oct 2017 14:02:15 GMTSemi-supervised Learning with Variational Autoencodershttp://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/Brian Keng<div><p>In this post, I'll be continuing on this variational autoencoder (VAE) line of
exploration
(previous posts: <a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders">here</a> and
<a class="reference external" href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset">here</a>) by
writing about how to use variational autoencoders to do semi-supervised
learning. In particular, I'll be explaining the technique used in
"Semi-supervised Learning with Deep Generative Models" by Kingma et al.
I'll be digging into the math (hopefully being more explicit than the paper),
giving a bit more background on the variational lower bound, as well as
my usual attempt at giving some more intuition.
I've also put some notebooks on Github that compare the VAE methods
with others such as PCA, CNNs, and pre-trained models. Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/">Read more…</a> (25 min remaining to read)</p></div>autoencodersCIFAR10CNNgenerative modelsinceptionKullback-LeiblermathjaxPCAsemi-supervised learningvariational calculushttp://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/Mon, 11 Sep 2017 12:40:47 GMT