One of the things that I find is usually missing from many ML papers is how they relate to the fundamentals. There's always a throwaway line where it assumes something that is not at all obvious (see my post on Importance Sampling). I'm the kind of person who likes to understand things to a satisfactory degree (it's literally in the subtitle of the blog) so I couldn't help myself investigating a minor idea I read about in a paper.
This post investigates how to use continuous density outputs (e.g. a logistic or normal distribution) to model discrete image data (e.g. 8-bit RGB values). It seems like it might be something obvious such as setting the loss as the average log-likelihood of the continuous density and that's almost the whole story. But leaving it at that skips over so many (interesting) and non-obvious things that you would never know if you didn't bother to look. I'm a curious fellow so come with me and let's take a look!