An introduction to Minimum Description Length (MDL) for linguists

March 29, 2023
Join Zoom Meeting
Starts at 18:00 (Moscow time)


John Goldsmith

Edward Carson Waller Distinguished Service Professor
University of Chicago



MDL analysis is the invention of statisticians, who think of themselves as designing tools that should in principle be usable and useful for intelligent people, regardless of what they are studying. Traditional statistical tools have been helpful for some linguists (notably in sociolinguistics and phonetics), but they have not been helpful for most linguists. MDL tries to come to grips with notions that are much more familiar to linguists, though: linguists have always cared about finding solutions that are as simple as possible, and measuring how well our analyses do or don't do justice to the data. MDL offers some conceptual tools for thinking about those questions.

At the heart of MDL analysis lies the notion of probabilistic grammars, which are motivated by reasons that have nothing to do with frequency or variation (as one might expect). They offer a novel and quite interesting way to connect the work of the grammar to the linguist's data (entirely different from trying to separate grammatical and ungrammatical sentences).

And I'll illustrate the idea by looking at two simple questions: discovery of words from continuous discourse, and discovering morphemes within words.