During my undergraduate days, one of the most interesting courses I took was on coding and compression. Here was a course that combined algorithms, probability and secret messages, what's not to like? 1 I ended up not going down that career path, at least partially because communications systems had its heyday around the 2000s with companies like Nortel and Blackberry and its predecessors (some like to joke that all the major theoretical breakthroughs were done by Shannon and his discovery of information theory around 1950). Fortunately, I eventually wound up studying industrial applications of classical AI techniques and then machine learning, which has really grown like crazy in the last 10 years or so. Which is exactly why I was so surprised that a new and better method of lossless compression was developed in 2009 after I finished my undergraduate degree when I was well into my PhD. It's a bit mind boggling that something as well-studied as entropy-based lossless compression still had (have?) totally new methods to discover, but I digress.
In this post, I'm going to write about a relatively new entropy based encoding method called Asymmetrical Numeral Systems (ANS) developed by Jaroslaw (Jarek) Duda . If you've ever heard of Arithmetic Coding (probably best known for its use in JPEG compression), ANS runs in a very similar vein. It can generate codes that are close to the theoretical compression limit (similar to Arithmetic coding) but is much more efficient. It's been used in modern compression algorithms since 2014 including compressors developed by Facebook, Apple and Google . As usual, I'm going to go over some background, some math, some examples to help with intuition, and finally some experiments with a toy ANS implementation I wrote. I hope you're as excited as I am, let's begin!