MDL Linear Regression

For my first post of real content, I’ve decided to comment on a document I have just finished writing that discusses Jorma Rissanen’s MDL (minimum description length) linear regression criterion that was presented in “MDL Denoising” [J. Rissanen, IEEE Transactions on Information Theory, Vol. 46, No. 7, 2000]. I’ve wanted to understand the mathematics behind this paper for quite some time now, and last week I finally decided to sit down and work through the paper.

The result (available here) is a detailed, step-by-step derivation of Rissanen’s criterion, which I personally think is significantly easier to understand than the more terse derivation presented in the original paper, and I hope someone will find it useful :) Over the next couple of weeks, time permitting, I plan a similar exercise for several more of J. Rissanen’s papers – particular, “Fisher Information and Stochastic Complexity” and “Strong Optimality of the Normalized ML Models as Universal Codes and Information in Data”.