The Elements of Machine Learning WS'20


News

more ▾

Course Information

Type Basic Lecture (6 ECTS) for BSc DSAI — Advanced Lecture (6 ECTS) for all others
Lecturers Prof. Dr. Jilles Vreeken and Prof. Dr. Isabel Valera
Assistants Osman Ali Mian (lead), Miriam Rateike, Joscha Cueppers
Tutors Anika Fuchs, Lakshmi Rajendram, Mohammad Yaseen, Muneeb Aadil, Neda Foroutan, Nisha George
Email eml-ta (at) mmci.uni-saarland.de
Lectures Thursdays, 14–16 o'clock via Zoom and YouTube
Tutorials Mondays and Tuesdays, 12–14 o'clock via Zoom
Office Hours Prof. Dr. Jilles Vreeken and Prof. Dr. Isabel Valera: after each lecture
Assistants: by appointment
Summary

In this course we will discuss the foundations – the elements – of machine learning. In particular, we will focus on the ability of, given a data set, to choose an appropriate method for analyzing it, to select the appropriate parameters for the model generated by that method and to assess the quality of the resulting model. Both theoretical and practical aspects will be covered. What we cover will be relevant for computer scientists in general as well as for other scientists involved in data analysis and modeling. (This course replaces the course Elements of Statistical Learning, and will be held in English.)

Prerequisites

The course is targeted to students in computer science, bioinformatics, math, and general sciences with a mathematical background. Students should know linear algebra and have good basic knowledge of statistics, for example by having taken Mathematics for Computer Scientists I and II (for linear algebra) and Statistics Lab or Mathematics for Computer Scientists III (for statistics). We provide a self-test that you can use to evaluate whether you have the required background to attempt EML.

Exam

To be eligible to participate in the exams, you will need to have cumulatively scored 50% of the points for the theoretical exercises and 50% of the points for the programming exercises.

To participate in the exams, you will need to register at least one week before the exam via the LSF/HISPOS system of Saarland University. If this is not possible, register with us via email.

The final exams will be written and held online. All material covered in the lectures, slides, exercises, and required reading is relevant.

The re-exam will be on Thursday March 25th. We will distribute the PDF here from 13:50 onward, allowing you to download and possibly print it. The exam officially starts at 14:00. You will have until 16:30 to prepare your solutions, which you need to upload as a ZIP-file to CMS by 17:00 latest. This is a hard deadline. (Clearly indicated links to both the exam PDF and the CMS submission site will be made available on this site). (If disaster strikes and the CMS is down, we accept solutions by email via eml-ta (at) mmci.uni-saarland.de but only if they reach us by the deadline.)

The exam will be open book, meaning that you are allowed to consult the slides, the lecture videos, the books, etc. We will design the exam, however, such that it strongly favours those that studied over those that plan to look things up. There is a big difference between consulting and straight-up copying, and it will come to no surprise that we do not condone plagariasm. Exams are to be done individually.

You are allowed to write your solutions and derivations on paper using a black or blue pen and may submit a ZIP-file with clearly readable JPG-pictures or PDF-files – for example taken using a smartphone camera or scanning app – of your answers by the deadline. You may use plain or lined paper, but as it often causes readability issues when digitized, we do not allow squared paper.

You are also allowed to write your solutions digitally – for example using LaTeX, Word, or a notepad app on your tablet – and include PDF files of these answers in the ZIP-file.

Your solution file should be named "matriculation#-exam.zip", e.g. "2424242-exam.zip". The individual solution JPG or PDF files inside the ZIP should be named such that it is immediately clear to which questions they are relevant, e.g. "2424242-exam-q1.jpg", "2424242-exam-q2,q3a.pdf", "2424242-exam-q3bc.jpg", etc. If you submit one PDF file, you can simply name it "2424242-exam-answers.pdf".

Students who do not have a smartphone or other option to take a picture of their solutions should let us know as soon as possible via email.

Students who do not have access to a well-suited environment to write the exam we can offer a spot in one of the lecture halls. If you want to make use of this option, let us know latest one week before the exam. Wearing an FFP2 mask will be mandatory for the duration of the exam.

Communication

All official communication to students will be done via this website, and/or via the eml-all mailing list to which you will be automatically subscribed after registration via CMS.

All questions or other communication to Lecturers, TAs, and Tutors has to be done by email at eml-ta (at) mmci.uni-saarland.de using your university account. Emails from e.g. gmail will be ignored. Impolite messages will at best be ignored.

To facilitate interaction between students, such as forming groups, as well as collaboratively working on assignments within groups, EML has an optional (moderated) Discord server that you can join via this invite. Behave.

Lectures

Month Day Topic Slides Assignment Req. Reading
OrganizationEML20/21, Lecture 0, Organizational Details PDF self-test ESL 1, ISLR 1
Nov 5 Statistical Learning PDF 1st sheet out ESL 2, ISLR 2
12 Linear Regression I PDF ESL 3, ISLR 3
19 Linear Regression II PDF deadline 1st, 2nd out ESL 3, ISLR 3
26 Classification PDF ESL 4, ISLR 4
Dec 3 Resampling Methods PDF deadline 2nd, 3rd out ESL 7, ISLR 5
10 Model Selection and Regularization PDF ESL 3, ISLR 6
17 Dimensionality Reduction PDF deadline 3rd, 4th out ESL 3, 14, ISLR 6, 7
24 yay holiday – no class
31 yay holiday – no class
Jan 7 Beyond Linear PDF ESL 5, 9, ISLR 7
14 Trees and Forests PDF deadline 4th, 5th out ESL 9, ISLR 8
21 Support Vector Machines PDF ESL 12, ISLR 9
28 Unsupervised Learning I PDF deadline 5th, 6th out ESL 14, ISLR 10, [4]
Feb 4 Unsupervised Learning II PDF ESL 14, ISLR 10
11 final assignment sheet deadline 6th
18 registration deadline exam
25 exam (online)
March 18 registration deadline re-exam
25 re-exam (online)

Links to the Zoom meeting and YouTube live-stream will appear one hour before the lecture starts.

Materials

The course will, by and large, follow the book "An Introduction to Statistical Learning with Applications in R" [1]. At times the course will take additional material from the book "The Elements of Statistical Learning" [2]. The former book is the more introductory text, the latter book is more advanced. Both books are available as free PDFs. We strongly encourage you, though, to acquire at least the first book in print. If you need to brush up on statistics, we recommend "All of Statistics" [3]. All books, as well as further background literature, are available via the library in a so-called Semesteraparat.

[1] James, W., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning with Applications in R. Springer, 2013.
[2] Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Springer, 2009.
[3] Wasserman, L. All of Statistics. Springer, 2005.

For selected lectures we will identify interesting optional reading, such as relevant recent research papers. These we will make available here.

[4] van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008.

Assignments

Each problem set will cover theoretical proofs and programming exercises with roughly equal weight. In general, the deadlines are on the day indicated in the schedule at 14:00 Saarbrücken standard-time. You are free to hand in earlier. Further details will be announced in the first lecture.

As programming language we will use R – a language for statistical computing. It is freely available for Windows, Linux and Mac. As a vectorized programming language, it is ideally suited for the problems we will encounter. There are also many freely available packages (or libraries) to perform a variety of classification and regression tasks, or to visualize the results of statistical analyses in a convenient way. In Tutorial 0 (see below) we provide a quick introduction to R.

You hand in your solution as visually depicted here in the cms. In particular, for each theoretical exercise you upload a seperate PDF file. For the practical exercise upload one zip file that contains both a main.r file, and one PDF file which contains all plots and answers to questions. Please follow our Guidelines for the Practical Assignments.

No. Handout Date due Discussed on Assignment Sheet Additional Material
0 20 Oct 2020 Self-assessment ozone data
1 5 Nov 2020 19 Nov 2020 23 and 24 Nov 2020 Linear Regression ozone data
2 19 Nov 2020 3 Dec 2020 7 and 8 Dec 2020 Classification phoneme data
3 3 Dec 2020 17 Dec 2020 4 and 5 Jan 2021 Resampling & Regularization prostate data
4 17 Dec 2020 14 Jan 2021 18 and 19 Jan 2021 Dimensionality & Splines prostate data
5 14 Jan 2021 28 Jan 2021 1 and 2 Feb 2021 Trees & Nuts wdbc data
6 28 Jan 2021 11 Feb 2021 15 and 16 Feb 2021 Unsupervised Learning A6Data1, A6Data2

Assignment sheet 0 is an ungraded self-test. If you can solve the problems of this sheet without too much effort, without looking anything up, you're ready to take EML. If you have trouble doing these exercises, you probably best use your time until the course starts to boost your knowledge of linear algebra and statistics. The lectures Statistics Lab and Mathmatics for Computer Science I and II are recommended in particular. A good textbook on statistics is "All of Statistics" [3]

Tutorials

The tutorials focus on the problem sets. In the week after you submitted an assignment, we will present and discuss the solutions. In the week after, you will receive the corrected sheets, and we will help you with the current problem set. Depending on time and popular demand, we will also give brief reiteration of parts of the lecture. If you have any questions about the lecture, write to eml-ta (at) mmci.uni-saarland.de .

No. Date Title  
0 Introduction to R EML 20/21 Tutorial 0 - Introduction to R - YouTube  
1 09/10 Nov 2020 How to Submit, Assistance with Assignment 1 How to Submit, Guidelines for Practical Assignments
2 16/17 Nov 2020 Assistance with Assignment 1
3 23/24 Nov 2020 Discussing Assignment 1
4 30/01 Dec 2020 Assistance with Assignment 2
5 07/08 Dec 2020 Discussing Assignment 2
6 14/15 Dec 2020 Assistance with Assignment 3
7 04/05 Jan 2021 Discussing Assignment 3
8 11/12 Jan 2021 Assistance with Assignment 4
9 18/19 Jan 2021 Discussing Assignment 4
11 25/26 Jan 2021 Assistance with Assignment 5
12 01/02 Feb 2021 Discussing Assignment 5
13 08/09 Feb 2021 Assistance with Assignment 6
14 15/16 Feb 2021 Discussing Assignment 6

Links to the Zoom meeting will appear 1 hour before the tutorial starts. Tutorials will be held via Zoom only.

During Discussion tutorials, the TAs and tutors will present the correct solution and answer questions about them. During the Assistance tutorials, their focus is on giving individual and group-wise assistance. We will both answer questions that are raised publically, as well as use break-out rooms to answer questions individually. To ask for individual assistance, either use the raise-hand function of Zoom, or simply write a direct-message to the TA. They will then be in touch with you, and if necessary create a break-out room to which you and a tutor will be assigned and can communicate via voice.

R resources

R (version 3.2.3) is installed on the CIP pool computers and can be started by invoking R from the command line.

The official web site of the R project is r-project.org. You can download R for Windows, Linux and Mac from there. Additional packages, documentation and tutorials are also available for download from the official web site. Useful manuals and tutorials include:

The CRAN Contributed Documentation lists many other tutorials for R beginners and advanced programmers.

You can also check out RStudio, an open-source IDE for R.

Registration

Students for which EML is a required part of their Bachelor programme can take it as a Basic Lecture, while all other students (Bachelor or Master) can take it as an Advanced Lecture. There is no difference between the two, except for how it will be listed on your transcript, either way you receive 6 ECTS for succesfully completing the exam. (Obviously, you can only get credits for the course once.)

Registration for EML is closed. Late registrations can be requested via email.

Online Lectures, Zoom, and Privacy

We would be happy if we could create a pleasant lecture environment despite the current situation. Personal interactions, with your camera switched on, may contribute to this environment. We also encourage you to ask questions verbally. Note that this is voluntary. You may switch off both your camera and your microphone, and register under a pseudonym. Questions are still possible, in particular using the chat function.

We have decided to use Zoom as a videoconferencing service. Note that this provider (Zoom Video Communications, Inc., 55 Almaden Blvd, Suite 600, San Jose, CA 95113, USA) can access all data that you provide when registering for the video conference. If you do not provide personal data during the registration, there is still a possibility that Zoom identifies you using your IP address. We would not have decided to use Zoom if we considered this as a significant risk. As an additional precaution, we have opted to use European computing centers. Should you still have privacy concerns (and are using an Internet Service Provider that can map IP addresses to your name), we suggest using an anonymization service such as Tor.

You can find Zoom's complete privacy policy here.

Acknowledgements

EML is based on The Elements of Statistical Learning as developed by Thomas Lengauer. We thank him for kindly sharing both materials and experience.