Advanced topics in high-performance computing

(MCSC 6230G/7230G)

Fall 2017

(MCSC 6230G/7230G)

Fall 2017

Faisal Qureshi

faisal.qureshi@uoit.net

Nov 29, 2017

Last lecture.

Nov 13, 2017

Important information about project presentations and report posted on the course Slack channel.

Nov 13, 2017

Assignment 3 is now available.

Oct 19, 2017

Assignment 2 is now available.

Oct 10, 2017

One page project proposals are due **Oct. 31**.

Oct 10, 2017

Code examples available on Github. See below.

Oct 5, 2017

Paper presentation schedule is now available. Please check course slack.

Sep 29, 2017

Assignment 1 is now available.

Sep 26, 2017

Reading paper list is now available.

Aug 28, 2017

Website is now online.

**Faisal Qureshi**

Email: faisal.qureshi@uoit.net

Office: UA4032

We will be using Slack for online communication. Please ensure that you are enrolled in the following slack channel:

mcsc-ml-f17-uoit.slack.com.

- Wed, 12:40 - 3:30 pm in ERC3027

- Tue, 1 - 2 pm in UA4032
- Or by appointment

This is an introductory graduate course in machine learning. This course will focus on both supervised and un-supervised learning methods, covering both theory and practice. The course is geared towards students who wish to develop a working knowledge of the recent advances in machine learning, and how these are applied in various domains.

Machine learning deals with how to design computer programs that learn from “experience.” Residing at the intersection of computer science and statistics, machine learning aims to extract useful information from data (often referred to as the *training data*) and leverages this information to create computer models capable of carrying out useful, non-trivial tasks, such as designing cars that can drive on their own, filters for blocking junk email, diagnostics tools for disease discovery, etc. By many accounts machine learning is the “greatest export” of computer science (and statistics) to other disciplines.

The course will cover the following topics:

- regression;
- classification;
- clustering;
- dimensionality reduction;
- mixture-models; and
- neural networks and deep learning.

The course assumes that students are comfortable with statistics, basic linear algebra, and programming.

We live in exciting times. Copious amount of information about machine learning is available on the internet. Check out the awesome machine learning on Github for list of machine learning courses and free, open source books.

**Important:** Each lecture will include a programming activity. Please bring your laptops in to the lectures. Also ensure that your laptop has Python, Numpy, Scipy, Matplotlib, and Sklearn installed. The easiest way to achieve this it to download the Anaconda Python distribution.

*Tony Joseph will lead the first lecture. I am away at a conference in Berlin.*

The goal is to use Kmeans or Meanshift to cluster the "make circles" dataset into two clusters. [Code]

Experiments with linear regression. [Code] [Data]

- Tensorflow (Convolutional network for Cifar-10 image classification)

We will continue our discussion of Gaussian processes.

- Course project presentations

Each student needs to select a relevant machine learning paper and give a 20 minutes presentation, outlining the contributions, strengths and weaknesses of that paper. To get the process moving I have started to put together a list of papers. Each of you is asked to select "one" paper that catches your interest. I will use FIFO to resolve ties.

*Feel free to suggest another relevant machine learning paper*

Each paper presentation is give or take 20 minutes long, followed by a discussion. It is expected that all of you would've read the paper before coming to the lecture. The presentation should focus on the "key contribution" of the paper and how the topics covered in the paper fit into the larger machine learning landscape. Pay close attention to how paper is written, how ideas are presented, how methods are developed, how arguments are structured and how results are used to bolster the key idea of the paper.

Presentation schedule and papers are available via course Slack.

The students can work on projects individually or in pairs. The project can be an interesting topic that the student comes up with himself/herself or with the help of the instructor. The grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughful are your conclusions. (*Taken from Raquel Urtasun CSCI 2515 project description.*)

- In class, Wed., Nov. 29
- 10 min total time, 7 min presentation and demo, 3 min QA and discussion
- To ensure timely proceedings, please upload project presentations via Blackboard by Tue., Nov. 28 midnight in pdf format.
- Project report due on Fri., Dec. 8, 11:59 pm

- Assignment 1
**Due back Oct 11, 11:59 pm** - Assignment 2
**Due back Nov 1, 11:59 pm** - Assignment 3
**Due back Nov 27, 11:59 pm**

I recommend reading Part 1 of “Deep Learning” by I. Goodfellow, Y. Bengio and A. Courville to brush up on linear algebra and statistics. The book is available at here

We will be using Python for the programming part of this course. For Python, I recommend the Anaconda distribution, which comes pre-loaded for nearly all the packages that we will be using in this course. Of course you are welcome to use any variant/distribution of Python that suits you.

Here you’ll find a number of tutorials showcasing Python use in machine learning. I strongly recommend that you become comfortable with the following four Python packages/environment:

- numpy;
- scipy;
- matplotlib;
- jupyter notebook; and
- TensorFlow.

- "Deep Learning" by I. Goodfellow, Y. Bengio and A. Courville
- "Pattern Recognition and Machine Learning" by C. Bishop
- "Machine Learning: A Probabilistic Perspective" by K. Murphy
- "Generalized Linear Models" by P. McCullagh and J.A. Nelder
- "The Elements of Statistical Learning: Data Mining, Inference, and Prediction" by T. Hastie, R. Tibshirani and J. Friedman
- "Machine Learning with TensorFlow" by N. Shukla (
**This book is particular useful for those interested in applied machine learning.**)

Code examples used in this course are available on Github (https://github.com/uoit-ml/mcsc-ml).