Topics in Digital Media

(CSCI 5550G)

Fall 2018

(CSCI 5550G)

Fall 2018

Faisal Qureshi

faisal.qureshi@uoit.net

Dec 2, 2018

Course project presentations on Monday, Dec. 10 at 12 pm in UA4170. Check course slack for more information.

Nov 29, 2018

Last lecture on Friday, Nov. 30

Nov 23, 2018

Course project presentations will take place in the last week of class

Sep 27, 2018

Check out Blackboard for course work submission.

Sep 10, 2018

Papers are available here.

Aug 13, 2018

Website is now online.

**Faisal Qureshi**

Email: faisal.qureshi@uoit.net

Office: UA4032

We will be using Slack for online communication. Please ensure that you are enrolled in the following slack channel:

csci-5550g-f18-uoit.slack.com.

- Fri, 2:10 - 5:00 pm in ERC2056

- Fri, 1 - 2 pm in UA4032
- Or by appointment

This is an introductory graduate course in machine learning and computer vision. The course will focus on machine learning theory and methods for computer vision applications. The course is geared towards students who wish to develop a working knowledge of the recent advances in machine learning, and how these advances have lead to increasing powerful computer vision systems.

Machine learning deals with how to design computer programs that learn from “experience.” Residing at the intersection of computer science and statistics, machine learning aims to extract useful information from data (often referred to as the *training data*) and leverages this information to create computer models capable of carrying out useful, non-trivial tasks, such as designing cars that can drive on their own, filters for blocking junk email, diagnostics tools for disease discovery, analyzing images for scene understanding, etc. By many accounts machine learning is the “greatest export” of computer science (and statistics) to other disciplines.

Computer vision deals with processing and analyzing digital images to extract useful properties about the real world. Computer vision, for example, can be used to extract 3D scene structure from a given set of photos, recognize people in images, identify actions in a video sequence, etc. Computer vision has also been used in specialized domains, such as medical imaging, say for analyzing CT scans or MRI photographs, satellite imaging, say for analyzing the health of a an ecosystem, etc. Computer vision has also found wide-spread use in entertainment and gaming industry.

Solving computer vision, it turns out, is a tough problem. Digital images after all are little more than a collection of pixels. Recent advances in machine learning, especially in deep learning, has opened up new avenues for computer vision research. The goal is simple: design algorithms and systems that will enable a computer to "learn to see" by "looking" at example pictures and videos. With this in mind, this course will explore machine learning approaches that have found wide-spread use in computer vision applications.

This course will mix lectures on a selection of topics with paper reading and discussion. The topics are selected to help you understand and implement the papers that you are asked to read, present, and discuss. The first 45 minutes of most classes will be devoted to lectures on one of the selected topics. The remain time will be used for paper presentation and discussion. The course will cover the following topics:

- image formation and camera models;
- optical flow;
- depth analysis;
- action recognition;
- convolution (filtering);
- regression;
- classification;
- clustering;
- dimensionality reduction; and
- neural networks and deep learning.

These topics provides a decent basis for understanding the papers that we plan to read and discuss in this course.

The course assumes that students are comfortable with statistics, basic linear algebra, and programming.

I recommend reading Part 1 of “Deep Learning” by I. Goodfellow, Y. Bengio and A. Courville to brush up on linear algebra and statistics. The book is available at here

We will be using Python for the programming part of this course. For Python, I recommend the Anaconda distribution, which comes pre-loaded for nearly all the packages that we will be using in this course. Of course you are welcome to use any variant/distribution of Python that suits you.

The paper also assumes that students are willing to read and comprehend large volumes of technical papers. Furthermore, that students have some experience with technical report writing.

You will find the following computer vision books useful.

*Fundamentals of Computer Vison*by Mubarak Shah*Multi View Geometry in Computer Vision*by Richard Hartley and Andrew Zisserman*Computer Vision: Algorithms and Applications*by Richard Szleski*Computer Vision: Models, Learning, and Inference*by Simon J.D. Prince.

Following books are good resources for machine learning, especially deep learning

*Neural Networks and Deep Learning: A Textbook*by Charu C. Aggarwal.*Deep Learning*by Ian Goodfellow and Yoshua Bengio.*Machine Learning: A Probabilistic Perspective*by Kevin P. Murphy and Francis Bach.*Understanding Machine Learning: From Theory to Algorithms*by Shai Shalev-Shwartz and Shai Ben-David*Pattern Recognition and Machine Learning*by Christopher M. Bishop.*The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition*by Trevor Hastie and Robert Tibshirani.

These resources will not only help you understand the assigned papers. These resources may prove invaluable for your course projects.

Here you’ll find a number of tutorials showcasing Python use in machine learning. I strongly recommend that you become comfortable with the following four Python packages/environment:

- numpy;
- scipy;
- matplotlib; and
- jupyter notebook.

- Course project, 40% (
**A student needs to get 60% marks in the project to successfully complete the course.**)- Proposal
- Progress report
- Presentations
- Technical report

- Participation and interactions, 20%
- Discussion, readings, QA, class exercises

- Paper presentation and leading discussion, 30%
- One pagers, 10%

- One pagers
- Pager 1: Oct. 1, midnight
- Pager 2: Oct. 15, midnight
- Pager 3: Nov. 1, midnight
- Pager 4: Nov. 15, midnight

- Project
- proposal: Oct. 7, midnight
- progress report: Nov. 20, midnight
- final report: Dec. 15, midnight
- project presentation: last two weeks of classes

- Presentations
- Through out the term

Imaging geometry and camera calibration.

Fundamentals of Computer Vision, Ch. 1.

Structure from Motion

Fundamentals of Computer Vision, Sec. 5.5.

Photo Tourism: Exploring Photo Collections in 3D, ACM Transactions on Graphics 2006 (presented by Samantha Stahlke).

Motion models and image filtering

Fundamentals of Computer Vision, Ch. 5.

Distinctive Image Features from Scale-Invariant Keypoints, IJCV 2004 (presented by Michael Stergianis).

Histograms of Oriented Gradients for Human Detection, CVPR 2005 (presented by Ghazal Reshad).

Probabilistic view of linear regression

ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 (presented by Tony Joseph).

Exploring linear regression (Jupyter notebook)

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, ICCV 2015 (presented by Hunter Thomas).

Linear regression using PyTorch 1

Linear regression using PyTorch 2

Linear regression using PyTorch 3

MNIST Classification [Solution]

DeLiGAN: Generative Adversarial Networks for Diverse and Limited Data, CVPR 2017 (presented by Samantha Stahlke).

Fully Convolutional Networks for Semantic Segmentation, CVPR 2015 (presented by Ghazal Reshad).

Learning to See by Moving, ICCV 2015 (presented by Hunter Thomas)

Course project presentations will take place on Monday, December 10 at 12 pm in UA 4170.

Below I collect code samples that we have been using during the lectures.

- Linear regression 1
- Linear regression 2
- Linear regression 3
- Logistic regression 1
- Logistic regression 2

Each student will be assigned recent papers to read and present. The student will be responsible for leading the discussion for this paper. **Each student may be assigned to present multiple papers.**

Please find list of papers here.

Check out here for upcoming paper presentation assignments. Use your uoit.net address to access this document and enter your paper preferences.

Here's an advice from Prof. S. Keshav about how to read a paper. Do read this paper. This is excellent!

- Duration, 40 minutes
- Create a slideshow
- Easy to read
- Avoid verbosity
- Use figures, examples
- Clear and easily understandable structure
- Practice your talk before the lecture!

- Key questions
- What does the paper do?
- What are its limitations?
- What are its strengths?
- Is this paper reproducible?
- How does paper support its key arguments?
- What software does paper use?
- What datasets does paper use?
- How does this paper fit within the larger body of literature?

- Read the paper before the lecture
- Be prepared to answer questions
- Be prepared to participate in the discussion
- Provide feedback to the presenter
- Compliments, suggestions, criticism, thanks

To be announced.

The course project is an independent exploration of a specific problem within the context of this course. A project can be implementation oriented---where a student implements a computer vision system---or application oriented---where a student attempts to solve a problem (of suitable difficulty) by applying machine learning techniques. The project topic will be selected in consultation with the instructor.

Project grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughtful are your conclusions.

**Course project is an individual effort.**

- one page (12 pt)
- clear and concise problem statement
- discuss its relevance
- why is it an interesting problem to solve (level of difficulty)
- describe other related approaches
- sketch your approach
- list anticipated difficulties

- one page (12 pt)
- describe the problem you are working on (this should include any feedback that you've received on your project proposal)
- describe your approach in more detail
- summarize your accomplishments to date
- list next steps
- list any problems that you encountered, and how you solved otherwise
- identify any problems that you expect to encounter

- 15 minutes
- the problem description with a motivation
- a quick overview of related work
- the proposed solution
- a technical description of the solution
- encountered difficulties
- an evaluation
- future work and conclusion

For your final project write-up you must use ACM SIG Proceedings Template (available at the ACM website). Project report is at most 12 pages long, plus extra pages for references. Your report must of "publishable quality," i.e., no typos, grammar error.

The final deadline for project report submission is 15th of December, midnight EST. *This is a firm deadline.* You will incur a penalty of 40% if you do not meet this deadline. These strict rules mimic conference submission process:

- a predefined format;
- limited amount of space to explain your ideas and contribution; and
- firm submission deadline.

A one pager is a summary of the paper (assigned reading for that week). A one pager should not be more than 1 page long (12 pt font). The summary should describe what the paper is doing, its strengths and weaknesses. It should also identify possible future directions for research. One pager is marked according to the following rubric:

- Not submitted, 0 marks;
- Submitted and is of satisfactory quality, 1 mark; and
- Submitted and is of exceptional quality (i.e., raises questions that go beyond the scope of the paper), 2 marks.