A Look at The First Place Solution of a Dermatology Classification Kaggle Competition

One interesting thing I often think about is the gap between academic and real-world solutions. In general academic solutions play in the realm of idealized problem spaces, removing themselves from needing to care about the messiness of the real-world. Kaggle competitions are a (small) step in the right direction towards dealing with messiness, usually providing a true blind test set (vs. overused benchmarks), and opening a few degrees of freedom in terms the techniques that can be used, which usually eschews novelty in favour of more robust methods. To this end, I thought it would be useful to take a look at a more realistic problem (via a Kaggle competition) and understand the practical details that result in a superior solution.

This post will cover the first place solution [1] to the SIIM-ISIC Melanoma Classification [0] challenge. In addition to using tried and true architectures (mostly EfficientNets), they have some interesting tactics they use to formulate the problem, process the data, and train/validate the model. I'll cover background on the ML techniques, competition and data, architectural details, problem formulation, and implementation. I've also run some experiments to better understand the benefits of certain choices they made. Enjoy!

Read more…

LLM Fun: Building a Q&A Bot of Myself

Unless you've been living under a rock, you've probably heard of large language models (LLM) such as ChatGPT or Bard. I'm not one for riding a hype train but I do think LLMs are here to stay and either are going to have an impact as big as mobile as an interface (my current best guess) or perhaps something as big as the Internet itself. In either case, it behooves me to do a bit more investigation into this popular trend 1. At the same time, there are a bunch of other developer technologies that I've been wondering about like serverless computing, modern dev tools, and LLM-based code assistants, so I thought why not kill multiple birds with one stone.

This post is going to describe how I built a question and answering bot of myself using LLMs as well as my experience using the relevant developer tools such as ChatGPT, Github Copilot, Cloudflare workers, and a couple of other related ones. I start out with my motivation for doing this project, some brief background on the technologies, a description of how I built everything including some evaluation on LLM outputs, and finally some commentary. This post is a lot less heavy on the math as compared to my previous ones but it still has some good stuff so read on!

Read more…

Bayesian Learning via Stochastic Gradient Langevin Dynamics and Bayes by Backprop

After a long digression, I'm finally back to one of the main lines of research that I wanted to write about. The two main ideas in this post are not that recent but have been quite impactful (one of the papers won a recent ICML test of time award). They address two of the topics that are near and dear to my heart: Bayesian learning and scalability. Dare I even ask who wouldn't be interested in the intersection of these topics?

This post is about two techniques to perform scalable Bayesian inference. They both address the problem using stochastic gradient descent (SGD) but in very different ways. One leverages the observation that SGD plus some noise will converge to Bayesian posterior sampling [Welling2011], while the other generalizes the "reparameterization trick" from variational autoencoders to enable non-Gaussian posterior approximations [Blundell2015]. Both are easily implemented in the modern deep learning toolkit thus benefit from the massive scalability of that toolchain. As usual, I will go over the necessary background (or refer you to my previous posts), intuition, some math, and a couple of toy examples that I implemented.

Read more…

An Introduction to Stochastic Calculus

Through a couple of different avenues I wandered, yet again, down a rabbit hole leading to the topic of this post. The first avenue was through my main focus on a particular machine learning topic that utilized some concepts from physics, which naturally led me to stochastic calculus. The second avenue was through some projects at work in the quantitative finance space, which is one of the main applications of stochastic calculus. Naively, I thought I could write a brief post on it that would satisfy my curiosity -- that didn't work out at all! The result is this extra long post.

This post is about stochastic calculus, an extension of regular calculus to stochastic processes. It's not immediately obvious but the rigour needed to properly understand some of the key ideas requires going back to the measure theoretic definition of probability theory, so that's where I start in the background. From there I quickly move on to stochastic processes, the Wiener process, a particular flavour of stochastic calculus called Itô calculus, and finally end with a couple of applications. As usual, I try to include a mix of intuition, rigour where it helps intuition, and some simple examples. It's a deep and wide topic so I hope you enjoy my digest of it.

Read more…

Normalizing Flows with Real NVP

This post has been a long time coming. I originally started working on it several posts back but hit a roadblock in the implementation and then got distracted with some other ideas, which took me down various rabbit holes (here, here, and here). It feels good to finally get back on track to some core ML topics. The other nice thing about not being an academic researcher (not that I'm really researching anything here) is that there is no pressure to do anything! If it's just for fun, you can take your time with a topic, veer off track, and the come back to it later. It's nice having the freedom to do what you want (this applies to more than just learning about ML too)!

This post is going to talk about a class of deep probabilistic generative models called normalizing flows. Alongside Variational Autoencoders and autoregressive models 1 (e.g. Pixel CNN and Autoregressive autoencoders), normalizing flows have been one of the big ideas in deep probabilistic generative models (I don't count GANs because they are not quite probabilistic). Specifically, I'll be presenting one of the earlier normalizing flow techniques named Real NVP (circa 2016). The formulation is simple but surprisingly effective, which makes it a good candidate to understand more about normalizing flows. As usual, I'll go over some background, the method, an implementation (with commentary on the details), and some experimental results. Let's get into the flow!

Read more…

Hi, I'm Brian Keng. This is the place where I write about all things technical.

Twitter: @bjlkeng



Signup for Email Blog Posts