Type  Seminar (7 ECTS) 
Lecturer  Dr. Jilles Vreeken 
jilles (plus) its14 (at) mpiinf.mpg.de  
Meetings 
Tuesdays, 10–12 o'clock in Room E1.7 323. 
Max Capacity  10 students. See here how to apply. 
Summary  In this seminar we'll be investigating the following questions: What is interesting and meaningful structure, and how can we identify this in data? What are good models for our data when we don't know about any decent priors, don't have clear expectations, or, don't even know what we're looking for? What is the ultimate model for our data and how can we approximate this model in practice? We'll be exploring these questions in light of Algorithmic Information Theory (AIT) and its practical variant, the Minimum Description Length (MDL) principle. 
Month  Day  Type  Topic  Slides  Reading 

Oct  28  L  Practicalities, Introduction  [3] Ch 1.1–1.10  
Nov  4  L  Kolmogorov Complexity [1,2]  [3] Ch 2.1–2.2, 2.8  
Nov  11  D  Kolmogorov in Action  [4,5,6]  
Nov  18  L  The Minimum Description Length principle  [7] Ch 1  
Nov  25  L  Coding 
[3] Ch 1.11 & [7] Ch 3 

Dec  2  L&D  Practical Coding 1: MDL4BMF  [8,9]  
Dec  9  L&D  Practical Coding 2: Sequences (and Graphs)  [10,11]  
Dec  16  D  Structure Functions  [12,13], ([7] Ch 17.8)  
Dec  23  –  Winter break  
Dec  30  –  Winter break  
Jan  6  L  Refined MDL (Prequential)  [7] Ch 6.4, 17.5  
Jan  13  L  MML — model selection using priors  [7] Ch 17.4, [14]  
Feb  10  L  Course RoundUp  [7] Ch 1–20, [3] Ch 1–8.19 
Lecture type key:
Every student will have to give a presentation and write a report on an assigned topic. The presentation should be between 15 and 20 minutes long, and will be followed by a 10 minute discussion. The style of the presentation should be like a lecture; your fellow students can follow it with only the previous lectures of the course as background material, you should only discuss those details that are necessary, and make use of examples where possible. Students are welcome to express a top3 preferred topics to the lecturer. Topic will be assigned by the lecturer, who will try to follow the preferences indicated by the students. Or not.
In addition to the presentation, students will have to hand in a report that discusses the assigned topic. Unlike the presentation, which has to be highlevel, the report is where you are allowed to show off what you've learned. Summarize the material you read as clearly as you can in your own words, identifying the key contributions/most interesting or important aspects, relating the topic to any/all previous lectures in the course and/or papers read for the course, all the while correctly referring to the sources you've drawn from. The expected length of the report is 3 pages, but you are free to use as many pages as you like.
Students will be graded on the slides, the presentation, your answers in the discussion, and the report. Students are allowed to ask for feedback on their slides once a week, up till latest 1 week before the presentation. The provided reading material are provided as starting points. For some topics the provided materials will be (by far) (way) (more than) enough, for others you may want to gather more material. When in doubt, contact the lecturer.
The deadline for the slides is right before the meeting (i.e. 10:00) in which you have to give your presentation.
The deadline for the report is February 10th at 17:00 Saarbrücken standardtime. You are free to hand in earlier.
All required reading will be made available here. You will need a username and password to access the papers outside the MPI network. Contact the lecturer if you don't know the username or password.
[1]  On Tables of Random Numbers. The Indian Journal of Statistics, Series A, 25(4):369376, 1963. 
[2]  Three Approaches to the Quantitative Definition of Information. Problemy Peredachi Informatsii, 1(1):311, 1965. 
[3]  An Introduction to Kolmogorov Complexity and its Applications. Springer, 1993. 
[4]  The similarity metric. IEEE Transactions on Information Technology, 50(12):32503264, 2004. 
[5]  Clustering by Compression. IEEE Transactions on Information Technology, 51(4):15231545, 2005. 
[6]  On data mining, compression and Kolmogorov Complexity. Data Mining and Knowledge Discovery, 15(1):320, SpringerVerlag, 2007. 
[7]  The Minimum Description Length Principle. MIT Press, 2007. 
[8]  mdl4bmf: Minimum Description Length for Boolean Matrix Factorization. ACM Transactions on Knowledge Discovery from Data, 2014. 
[9]  Mining TopK Patterns from Binary Datasets in presence of Noise. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM), Columbus, OH, pages 165176, 2010. 
[10]  The Long and the Short of It: Summarizing Event Sequences with Serial Episodes. In Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China, ACM, 2012. 
[11]  VoG: Summarizing Graphs using Rich Vocabularies. In Proceedings of the SIAM International Conference on Data Mining (SDM), Philadelphia, PA, SIAM, 2014. 
[12]  A Structure Function for Transaction Data. In Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ, pages 558569, SIAM, 2011. 
[13]  Kolmogorov's Structure functions and Model Selection. IEEE Transactions on Information Technology, 50(12):32653290, 2004. 
[14]  Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 49(3):240265, Wiley, 1987. 
In general terms, the course will consist of
Students should have basic working knowledge of data analysis and statistics, e.g. by successfully having taken courses related to data mining, machine learning, and/or statistics, such as Topics in Algorithmic Data Analysis, Machine Learning, Probabilistic Graphical Models, Statistical Learning, Information Retrieval and Data Mining, etc.
The course has two hours of scheduled meetings per week. The first weeks will feature regular lectures covering the basic topics of the course. During the second phase the students will write essays based on the material covered in the lectures and scientific articles assigned by the lecturer. We will discuss materials in detail during the meeting. During the third phase the students will write an essay based on scientific articles assigned to them by the lecturer and will prepare a presentation to be held during the meeting. In addition, the students will write an essay about an assignment of their choosing to put their acquired knowledge to the test.
There will be no weekly tutorial group meetings.
The seminar is fully booked. It — or a course much like it — may be offered again in Winter Semester 2016.