The Elements of Machine Learning WS'20


Register and indicate your tutorial group preference here before 14:00 November 5th

News

more ▾

Course Information

Type Basic Lecture (6 ECTS) for BSc DSAI — Advanced Lecture (6 ECTS) for all others
Lecturers Prof. Dr. Jilles Vreeken and Prof. Dr. Isabel Valera
Assistants Osman Ali Mian (lead), Miriam Rateike, Joscha Cueppers
Tutors Anika Fuchs, Lakshmi Rajendram, Mohammad Yaseen, Muneeb Aadil, Neda Foroutan, Nisha George
Email eml-ta (at) mmci.uni-saarland.de
Lectures Thursdays, 14–16 o'clock via Zoom and YouTube
Tutorials Mondays and Tuesdays, 12–14 o'clock via Zoom
Office Hours Prof. Dr. Jilles Vreeken and Prof. Dr. Isabel Valera: after each lecture
Assistants: by appointment
Summary

In this course we will discuss the foundations – the elements – of machine learning. In particular, we will focus on the ability of, given a data set, to choose an appropriate method for analyzing it, to select the appropriate parameters for the model generated by that method and to assess the quality of the resulting model. Both theoretical and practical aspects will be covered. What we cover will be relevant for computer scientists in general as well as for other scientists involved in data analysis and modeling. (This course replaces the course Elements of Statistical Learning, and will be held in English.)

Prerequisites

The course is targeted to students in computer science, bioinformatics, math, and general sciences with a mathematical background. Students should know linear algebra and have good basic knowledge of statistics, for example by having taken Mathematics for Computer Scientists I and II (for linear algebra) and Statistics Lab or Mathematics for Computer Scientists III (for statistics). We provide a self-test that you can use to evaluate whether you have the required background to attempt EML.

Registration

To enroll for the course, express your preference for a tutorial group, and obtain the password, please register here. The deadline is 14:00 November 5th.

We will send out the password by email, starting from November 2nd. We will do so in daily batches.

The tutorial assignment will be announced on November 6th.

Preliminary Schedule

Month Day Topic Slides Assignment Req. Reading
Organization and Introduction PDF self-test ESL 1, ISLR 1
Nov 5 Statistical Learning 1st sheet out ESL 2, ISLR 2
12 Linear Regression I ESL 3, ISLR 3
19 Linear Regression II deadline 1st, 2nd out ESL 3, ISLR 3
26 Classification ESL 4, ISLR 4
Dec 3 Resampling Methods deadline 2nd, 3rd out ESL 7, ISLR 5
10 Model Selection and Regularization ESL 3, ISLR 6
17 Dimensionality Reduction deadline 3rd, 4th out ESL 3, 14, ISLR 6, 7
24 yay holiday – no class
31 yay holiday – no class
Jan 7 Beyond Linear ESL 5, 9, ISLR 7
14 Trees and Forests deadline 4th, 5th out ESL 9, ISLR 8
21 Support Vector Machines ESL 12, ISLR 9
28 Unsupervised Learning I deadline 5th, 6th out ESL 14, ISLR 10, [4]
Feb 4 Unsupervised Learning II ESL 14, ISLR 10
11 final assignment sheet deadline 6th
18 registration deadline exam
25 exam (preliminary date)
March 18 re-exam (preliminary date)

Materials

The course will, by and large, follow the book "An Introduction to Statistical Learning with Applications in R" [1]. At times the course will take additional material from the book "The Elements of Statistical Learning" [2]. The former book is the more introductory text, the latter book is more advanced. Both books are available as free PDFs. We strongly encourage you, though, to acquire at least the first book in print. If you need to brush up on statistics, we recommend "All of Statistics" [3]. All books, as well as further background literature, are available via the library in a so-called Semesteraparat.

[1] James, W., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning with Applications in R. Springer, 2013.
[2] Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer, 2009.
[3] Wasserman, L. All of Statistics. Springer, 2005.

For selected lectures we will identify interesting optional reading, such as relevant recent research papers. These we will make available here.

[4] van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008.

Assignments

Each problem set will cover theoretical proofs and programming exercises with roughly equal weight. In general, the deadlines are on the day indicated in the schedule at 10:00 Saarbrücken standard-time. You are free to hand in earlier. Further details will be announced in the first lecture.

As programming language we will use R – a language for statistical computing. It is freely available for Windows, Linux and Mac. As a vectorized programming language, it is ideally suited for the problems we will encounter. There are also many freely available packages (or libraries) to perform a variety of classification and regression tasks, or to visualize the results of statistical analyses in a convenient way. In Tutorial 0 (see below) we provide a quick introduction to R.

You hand in your solution as follows. For the theoretical exercises, you may hand in your solutions in handwritten form before the lecture, or send one PDF file with all the answers by email to eml-ta (at) mmci.uni-saarland.de. For the programming exercises, send a single email with both your R code as .R file (should compile with the command "Rscript YourCode.R") as well as a pdf answering the questions and showing the generated plots (if any).

No. Handout Date due Discussed on Assignment Sheet Additional Material
0 20 Oct 2020 Self-assessment ozone data
1 5 Nov 2020 19 Nov 2020 23 and 24 Nov 2020 available per the handout date

Assignment sheet 0 is an ungraded self-test. If you can solve the problems of this sheet without too much effort, without looking anything up, you're ready to take EML. If you have trouble doing these excercises, you probably best use your time until the course starts to boost your knowledge of linear algebra and statistics. The lectures Statistics Lab and Mathmatics for Computer Science I and II are recommended in particular. A good textbook on statistics is "All of Statistics" [3]

Tutorials

The tutorials focus on the problem sets, but we will also give (very) brief reiteration of parts of the lecture. If you have any questions about the lecture, write an e-mail to eml-ta (at) mmci.uni-saarland.de .

There will be one tutorial per week. In the week after you submitted an assignment, the solution will be presented in the tutorial sessions on Monday and Tuesday 12:00, repectively. We will also help you with the current problem set. In the following week, we will return the corrected sheets to you on Monday or Tuesday, respectively. We will also recapitulate the lectures, and have some time for discussions.

Please indicate your preference for a tutorial group by registering for the course. There are two options: Monday 12-14 and Tuesday 12-14. Registration closes on November 5, 14:00. We will announce the final assignment on November 6th.

No. Date Type
0 Introduction to R EML 20/21 Tutorial 0 - Introduction to R - YouTube
1 09/10 Nov 2020 assignment assistance

R resources

R (version 3.2.3) is installed on the CIP pool computers and can be started by invoking R from the command line.

The official web site of the R project is r-project.org. You can download R for Windows, Linux and Mac from there. Additional packages, documentation and tutorials are also available for download from the official web site. Useful manuals and tutorials include:

The CRAN Contributed Documentation lists many other tutorials for R beginners and advanced programmers.

You can also check out RStudio, an open-source IDE for R.

Grading and Exam

You need 50% of the points of the theoretical problem sets and 50% of the points for the programming exercises to be admitted to the exam.

To succesfully participate, you need to register for the exam in the LSF/HISPOS system of Saarland University – this will be possible as soon as the exam date has been entered into the system (this usually happens a few weeks into the semester).

The final exams will be oral if student numbers allow and otherwise be written. The final decision on this will be made three weeks into the course. The exact dates will depend on both the type of exam, as well as what the university health regulations at that time allow. The final exam will cover all the material discussed in the lectures and the required reading.

Students for which EML is a required part of their Bachelor programme can take it as a Basic Lecture, while all other students (Bachelor or Master) can take it as an Advanced Lecture. There is no difference between the two, except for how it will be listed on your transcript, either way you receive 6 ECTS for succesfully completing the exam. (Obviously, you can only get credits for the course once.)

Acknowledgements

This course was originally developed by Thomas Lengauer, and we thank him for kindly providing his lecture materials and experience.