Welcome to the new unit in Machine Learning, this unit is given for the second time the academic year 2018/19.

Machine Learning is the science of how we can build abstractions of the world from data. In this unit we will start with the fundamental underlying principles and philosophies that allows us to learn and then look at how we can formulate these using explicit models. We will then look at how we can fit these models to data in order to "learn". The unit is not aimed at giving you a set of tools that you can use to do machine learning but rather give you an understanding so that you can yourself design, choose and understand the tools that are around.

Machine learning is mathematical in nature and a good grasp of linear algebra and multi-variate calculus is recommended in order to fully digest the material. There is lots of good material on the internets that you can brush up your skill with. Particularly good is the upcoming (free) book Mathematics for Machine Learning. If you feel that it was a long time since you did math or feel a bit rusty, I recommend that you have a look through this book. A lot of the work we do is with matrices, a very good resource for matrix algebra is the The Matrix Cookbook. It doesn't really provide any understanding but if you need a resource to quickly look-up the derivative this is the place to start.

Material

You can download all the material for the course by checking out the following GitHub repo. Keep up to sync with this repository to access the lectures/coursework and additional material. I am unlikely to keep the links on this website up to date so its the material in the repository that counts. The blackboard site is not used for anything other than to access the video recordings of the lectures.

Reddit

There is a reddit feed associated with the unit. The idea is for this forum to be the main form of interaction outside of lectures. Due to the size of this unit I am not able to interact over email so please direct your questions to the reddit forum rather than mail.

Ethnography study

According to Wikipedia Ethnography is defined as "Ethnography is the systematic study of people and cultures. It is designed to explore cultural phenomena where the researcher observes society from the point of view of the subject of the study."

Machine learning have and is expected to have an ever greater impact on society, but this interaction goes both ways, society also effects machine learning as a topic. I am very excited to have ethnography PhD student Kate Byron as part of the team this year. Kate is doing research on how social and cultural norms shape machine learning practices. During the unit Kate will run a focus group to discuss the topic and conduct interviews regarding these questions. I think this is a great chance for anyone who is interested not only in the technical details but the "bigger picture" of the topic in society. If you are interested in getting involved please contact Kate Byron.

Lectures

We have two lectures of 1h per week, Monday 15-16 and Tuesday 16-17 both in the Queens Building lecture theatre 1.40. Following the lecture on Tuesday we will have 1h of currated lecture where you tell me what I should talk about. This could either be me going through some of the material from previous lectures in more detail, or it could be a request of something outside the unit that you want to hear about etc. We will use the reddit feed to decide on the material, please upvote things that you like and downvote things that you do not want to hear about.

  1. Introduction (w1)
  2. Basic Probabilities (w1)
  3. Distributions and further Probabilities (w2)
  4. Linear Regression (w2)
  5. Dual Linear Regression (w3)
  6. Gaussian Processes (w3)
  7. Gaussian Processes II and Unsupervised Learning (w4)
  8. Bayesian Optimisation (w4)
  9. Dirichlet Processes (w5)
  10. Topic Models (w5)
  11. Graphical Models (w6)
  12. Inference I: Laplace Approximation (w6)
  13. Inference II: Stochastic Methods (w7)
  14. Inference III: Variational Methods (w7)
  15. Neural Networks (w9)
  16. (TBC) (w9)
  17. (TBC) (w10)
  18. (TBC) (w10)
  19. (TBC) (w11)
  20. (TBC) (w11)

Last year I wrote a summary document of the lectures that can be found here this document is summarising the key points of each lecture and is a good starting point for testing your more abstract understanding of each of the topics. This document will be update during the unit so make sure that you are in sync with the repository.

Practical

Each week we have two hours of lab session. During the labs you will work on the coursework (more on this later). You are free to use whatever programming language that you want for the unit. But if you have an interest to continue working in machine learning I strongly recommend that you get used to working with Python as it is quickly become the standard language for ML. To help people kick their matlab habit we will use the first lab session to have a crash-course in Python.

The unit is assessed through two pieces of coursework and a written exam. The coursework should be done in groups of two and submitted in terms of a report. The first assignment is worth 35% of the mark while the second is worth 15%. In addition to submitting a report you should also submit your code. If you for one reason or the other want to work alone on the coursework, come and talk to me first and we will see what we can do about this. The coursework is marked independently of group size.

Models

The first coursework will cover the material from week 1 to the beginning of week 4 and is mostly focused on how to reason about probabilities and how to build models of data. We will see the strength of priors and look at different ways of incorporating assumptions.

You can download the description of the coursework here. The deadline for this will be on the 2nd of November in terms of a report that you will submit on SAFE.

Inference

The second coursework will focus on how we can perform approximative inference in models where it is intractable, computationally or analytically to compute the marginal distribution in closed form. You can download the coursework here.

This coursework will be released in the beginning of week 7 and the deadline will be on the 30th of November in terms of a report that you will submit on SAFE.

Feedback

The sheer volume of participants on this unit makes coursework challenging. For me I machine learning is something that you learn by iterating theory with practical implementation. Therefore I believe that it is important to keep the coursework as a part of this unit. I also believe that consistent marking is important and therefore I want to mark the coursework on my own rather than dividing the work up between me and the TAs. To get this equation to work something has to give, this year I've decided that rather than giving everyone individual feedback I will have a dedicated lecture slot where I go through the coursework. This is not ideal but I believe it is the best we can do.

Book

We will use Bishop, C. M., Pattern recognition and machine learning (2006) for most parts of this unit. The book is an excellent introduction to machine learning and one of the few that provides a consistent and rigourous narrative rather than falling into the trap and becomming a cookbook for practitioners.

The key to reading this book is to get the first chapter well, if you understand this then the rest of the book should be decipherable. So do spend a lot of time reading through that first chapter and really get the implications on what is said.

Reading List

Lecture Bishop Other Material
1    
2 1.1, 1.2.1-4, 1.3, 1.4  
3 2-2.3.5, 2.3.9, (2.4)  
4 3.1, 3.3-3.6  
5 6-6.2  
6 6.4-6.4.2  
7 6.4.3-6.4.4, 12.2-12.2.1  
8    
9    
10    
11 8.1-3  
12 4.3-4.4  
13 11.1.0-2,11.1.4, 11.2.0,11.3  
14 10.1, 10.3  
15    
16 5-5.3  
17    
18    
19