Bounded Rationalityhttp://bjlkeng.github.io/Understanding math, machine learning, and data to a satisfactory degree.enTue, 13 Mar 2018 13:40:22 GMTNikola (getnikola.com)http://blogs.law.harvard.edu/tech/rssTensors, Tensors, Tensorshttp://bjlkeng.github.io/posts/tensors-tensors-tensors/Brian Keng<div><p>This post is going to take a step back from some of the machine learning
topics that I've been writing about recently and go back to some basics: math!
In particular, tensors. This is a topic that is casually mentioned in machine
learning papers but for those of us who weren't physics or math majors
(*cough* computer engineers), it's a bit murky trying to understand what's going on.
So on my most recent vacation, I started reading a variety of sources on the
interweb trying to piece together a picture of what tensors were all
about. As usual, I'll skip the heavy formalities (partly because I probably
couldn't do them justice) and instead try to explain the intuition using my
usual approach of examples and more basic maths. I'll sprinkle in a bunch of
examples and also try to relate it back to ML where possible. Hope you like
it!</p>
<p><a href="http://bjlkeng.github.io/posts/tensors-tensors-tensors/">Read more…</a> (23 min remaining to read)</p></div>bilinearcontravariancecovariancecovectorsgeometric vectorslinear transformationsmathjaxmetric tensortensorshttp://bjlkeng.github.io/posts/tensors-tensors-tensors/Tue, 13 Mar 2018 13:24:57 GMTResidual Networkshttp://bjlkeng.github.io/posts/residual-networks/Brian Keng<div><p>Taking a small break from some of the heavier math, I thought I'd write a post
(aka learn more about) a very popular neural network architecture called
Residual Networks aka ResNet. This architecture is being very widely used
because it's so simple yet so powerful at the same time. The architecture's
performance is due its ability to add hundreds of layers (talk about deep
learning!) without degrading performance or adding difficulty to training. I
really like these types of robust advances where it doesn't require fiddling
with all sorts of hyper-parameters to make it work. Anyways, I'll introduce
the idea and show an implementation of ResNet on a few runs of a variational
autoencoder that I put together on the CIFAR10 dataset.</p>
<p><a href="http://bjlkeng.github.io/posts/residual-networks/">Read more…</a> (9 min remaining to read)</p></div>autoencodersCIFAR10mathjaxresidual networksResNethttp://bjlkeng.github.io/posts/residual-networks/Sun, 18 Feb 2018 18:55:13 GMTVariational Autoencoders with Inverse Autoregressive Flowshttp://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/Brian Keng<div><p>In this post, I'm going to be describing a really cool idea about how
to improve variational autoencoders using inverse autoregressive
flows. The main idea is that we can generate more powerful posterior
distributions compared to a more basic isotropic Gaussian by applying a
series of invertible transformations. This, in theory, will allow
your variational autoencoder to fit better by concentrating the
stochastic samples around a closer approximation to the true
posterior. The math works out so nicely while the results are kind of
marginal <a class="footnote-reference" href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/#id3" id="id1">[1]</a>. As usual, I'll go through some intuition, some math,
and have an implementation with few experiments I ran. Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/">Read more…</a> (18 min remaining to read)</p></div>autoencodersautoregressiveCIFAR10generative modelsKullback-LeiblerMADEmathjaxMNISTvariational calculushttp://bjlkeng.github.io/posts/variational-autoencoders-with-inverse-autoregressive-flows/Tue, 19 Dec 2017 13:47:38 GMTAutoregressive Autoencodershttp://bjlkeng.github.io/posts/autoregressive-autoencoders/Brian Keng<div><p>You might think that I'd be bored with autoencoders by now but I still
find them extremely interesting! In this post, I'm going to be explaining
a cute little idea that I came across in the paper <a class="reference external" href="https://arxiv.org/pdf/1502.03509.pdf">MADE: Masked Autoencoder
for Distribution Estimation</a>.
Traditional autoencoders are great because they can perform unsupervised
learning by mapping an input to a latent representation. However, one
drawback is that they don't have a solid probabilistic basis
(of course there are other variants of autoencoders that do, see previous posts
<a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders">here</a>,
<a class="reference external" href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset">here</a>, and
<a class="reference external" href="http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders">here</a>).
By using what the authors define as the <em>autoregressive property</em>, we can
transform the traditional autoencoder approach into a fully probabilistic model
with very little modification! As usual, I'll provide some intuition, math and
an implementation.</p>
<p><a href="http://bjlkeng.github.io/posts/autoregressive-autoencoders/">Read more…</a> (17 min remaining to read)</p></div>autoencodersautoregressivegenerative modelsMADEmathjaxMNISThttp://bjlkeng.github.io/posts/autoregressive-autoencoders/Sat, 14 Oct 2017 14:02:15 GMTSemi-supervised Learning with Variational Autoencodershttp://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/Brian Keng<div><p>In this post, I'll be continuing on this variational autoencoder (VAE) line of
exploration
(previous posts: <a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders">here</a> and
<a class="reference external" href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset">here</a>) by
writing about how to use variational autoencoders to do semi-supervised
learning. In particular, I'll be explaining the technique used in
"Semi-supervised Learning with Deep Generative Models" by Kingma et al.
I'll be digging into the math (hopefully being more explicit than the paper),
giving a bit more background on the variational lower bound, as well as
my usual attempt at giving some more intuition.
I've also put some notebooks on Github that compare the VAE methods
with others such as PCA, CNNs, and pre-trained models. Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/">Read more…</a> (22 min remaining to read)</p></div>autoencodersCIFAR10CNNgenerative modelsinceptionKullback-LeiblermathjaxPCAsemi-supervised learningvariational calculushttp://bjlkeng.github.io/posts/semi-supervised-learning-with-variational-autoencoders/Mon, 11 Sep 2017 12:40:47 GMTThe Hard Thing about Machine Learninghttp://bjlkeng.github.io/posts/the-hard-thing-about-machine-learning/Brian Keng<div><p>I wrote a post on the hard parts about machine learning over
at Rubikloud:</p>
<ul class="simple">
<li><a class="reference external" href="https://rubikloud.com/labs/data-science/hard-thing-machine-learning/">The Hard Thing about Machine Learning</a></li>
</ul>
<p>Here's a blurb:</p>
<blockquote>
<p>Much of the buzz around machine learning lately has been around novel
applications of deep learning models. They have captured our imagination by
anthropomorphizing them, allowing them to dream, play games at superhuman
levels, and read x-rays better than physicians. While these deep learning
models are incredibly powerful with incredible ingenuity built into them,
they are not humans, nor are they much more than “sufficiently large
parametric models trained with gradient descent on sufficiently many
examples.” In my experience, this is not the hard part about machine
learning.</p>
<p>Beyond the flashy headlines, the high-level math, and the computation-heavy
calculations, the whole point of machine learning — as has been with
computing and software before it — has been its application to real-world
outcomes. Invariably, this means dealing with the realities of messy data,
generating robust predictions, and automating decisions.</p>
<p>...</p>
<p>Just as much of the impact of machine learning is beneath the surface, the
hard parts of machine learning are not usually sexy. I would argue that the
hard parts about machine learning fall into two areas: generating robust
predictions and building machine learning systems.</p>
</blockquote>
<p>Enjoy!</p></div>Machine LearningRubikloudsystemshttp://bjlkeng.github.io/posts/the-hard-thing-about-machine-learning/Tue, 22 Aug 2017 12:32:55 GMTBuilding A Table Tennis Ranking Modelhttp://bjlkeng.github.io/posts/building-a-table-tennis-ranking-model/Brian Keng<div><p>I wrote a post about building a table tennis ranking model over at Rubikloud:</p>
<ul class="simple">
<li><a class="reference external" href="https://rubikloud.com/labs/building-table-tennis-ranking-model/">Building A Table Tennis Ranking Model</a></li>
</ul>
<p>It uses
<a class="reference external" href="https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model">Bradley-Terry</a>
probability model to predict the outcome of pair-wise comparisons (e.g. games
or matches). I describe an easy algorithm for fitting the model (via
MM-algorithms) as well as adding a simple Bayesian prior to handle ill-defined
cases. I even have some
<a class="reference external" href="https://github.com/bjlkeng/Bradley-Terry-Model">code on Github</a>
so you can build your own ranking system using Google sheets.</p>
<p>Here's a blurb:</p>
<blockquote>
<p>Many of our Rubikrew are big fans of table tennis, in fact, we’ve held an
annual table tennis tournament for all the employees for three years
running (and I’m the reigning champion). It’s an incredibly fun event where
everyone in the company gets involved from the tournament participants to
the spectators who provide lively play-by-play commentary.</p>
<p>Unfortunately, not everyone gets to participate either due to travel and
scheduling issues, or by the fact that they miss the actual tournament
period in the case of our interns and co-op students. Another downside is
that the event is a single-elimination tournament, so while it has a clear
winner the ranking of the participants is not clear.</p>
<p>Being a data scientist, I identified this as a thorny issue for our
Rubikrew table tennis players. So, I did what any data scientist would do
and I built a model.</p>
</blockquote>
<p>Enjoy!</p></div>Bradley-Terryping pongrankingRubikloudtable tennishttp://bjlkeng.github.io/posts/building-a-table-tennis-ranking-model/Wed, 19 Jul 2017 12:51:41 GMTA Variational Autoencoder on the SVHN datasethttp://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/Brian Keng<div><p>In this post, I'm going to share some notes on implementing a variational
autoencoder (VAE) on the
<a class="reference external" href="http://ufldl.stanford.edu/housenumbers/">Street View House Numbers</a>
(SVHN) dataset. My last post on
<a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders">variational autoencoders</a>
showed a simple example on the MNIST dataset but because it was so simple I
thought I might have missed some of the subtler points of VAEs -- boy was I
right! The fact that I'm not really a computer vision guy nor a deep learning
guy didn't help either. Through this exercise, I picked up some of the basics
in the "craft" of computer vision/deep learning area; there are a lot of subtle
points that are easy to gloss over if you're just reading someone else's
tutorial. I'll share with you some of the details in the math (that I
initially got wrong) and also some of the implementation notes along with a
notebook that I used to train the VAE. Please check out my previous post
on <a class="reference external" href="http://bjlkeng.github.io/posts/variational-autoencoders">variational autoencoders</a> to
get some background.</p>
<p><em>Update 2017-08-09: I actually found a bug in my original code where I was
only using a small subset of the data! I fixed it up in the notebooks and
I've added some inline comments below to say what I've changed. For the most
part, things have stayed the same but the generated images are a bit blurry
because the dataset isn't so easy anymore.</em></p>
<p><a href="http://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/">Read more…</a> (19 min remaining to read)</p></div>autoencodersgenerative modelsKullback-Leiblermathjaxsvhnvariational calculushttp://bjlkeng.github.io/posts/a-variational-autoencoder-on-the-svnh-dataset/Thu, 13 Jul 2017 12:13:03 GMTVariational Autoencodershttp://bjlkeng.github.io/posts/variational-autoencoders/Brian Keng<div><p>This post is going to talk about an incredibly interesting unsupervised
learning method in machine learning called variational autoencoders. It's main
claim to fame is in building generative models of complex distributions like
handwritten digits, faces, and image segments among others. The really cool
thing about this topic is that it has firm roots in probability but uses a
function approximator (i.e. neural networks) to approximate an otherwise
intractable problem. As usual, I'll try to start with some background and
motivation, include a healthy does of math, and along the way try to convey
some of the intuition of why it works. I've also annotated a
<a class="reference external" href="https://github.com/bjlkeng/sandbox/blob/master/notebooks/variational-autoencoder.ipynb">basic example</a>
so you can see how the math relates to an actual implementation. I based much
of this post on Carl Doersch's <a class="reference external" href="https://arxiv.org/abs/1606.05908">tutorial</a>,
which has a great explanation on this whole topic, so make sure you check that
out too.</p>
<p><a href="http://bjlkeng.github.io/posts/variational-autoencoders/">Read more…</a> (25 min remaining to read)</p></div>autoencodersgenerative modelsKullback-Leiblermathjaxvariational calculushttp://bjlkeng.github.io/posts/variational-autoencoders/Tue, 30 May 2017 12:19:36 GMTVariational Bayes and The Mean-Field Approximationhttp://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/Brian Keng<div><p>This post is going to cover Variational Bayesian methods and, in particular,
the most common one, the mean-field approximation. This is a topic that I've
been trying to understand for a while now but didn't quite have all the background
that I needed. After picking up the main ideas from
<a class="reference external" href="http://bjlkeng.github.io/posts/the-calculus-of-variations">variational calculus</a> and
getting more fluent in manipulating probability statements like
in my <a class="reference external" href="http://bjlkeng.github.io/posts/the-expectation-maximization-algorithm">EM</a> post,
this variational Bayes stuff seems a lot easier.</p>
<p>Variational Bayesian methods are a set of techniques to approximate posterior
distributions in <a class="reference external" href="https://en.wikipedia.org/wiki/Bayesian_inference">Bayesian Inference</a>.
If this sounds a bit terse, keep reading! I hope to provide some intuition
so that the big ideas are easy to understand (which they are), but of course we
can't do that well unless we have a healthy dose of mathematics. For some of the
background concepts, I'll try to refer you to good sources (including my own),
which I find is the main blocker to understanding this subject (admittedly, the
math can sometimes be a bit cryptic too). Enjoy!</p>
<p><a href="http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/">Read more…</a> (24 min remaining to read)</p></div>BayesianKullback-Leiblermathjaxmean-fieldvariational calculushttp://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/Mon, 03 Apr 2017 13:02:46 GMT