Description

This graduate-level course offers an introduction to computer vision, with emphasis on both theoretical foundations and real-world applications.

At its core, computer vision is concerned with processing and analyzing digital images to extract meaningful information about the physical world. Applications range from reconstructing 3D scene structure from photographs, to recognizing people in images, to identifying actions in video sequences. Beyond these general tasks, computer vision plays a vital role in specialized domains such as medical imaging (e.g., analyzing CT or MRI scans), remote sensing (e.g., monitoring ecosystems through satellite imagery), and the entertainment and gaming industries.

Despite its successes, computer vision remains a challenging field. Digital images are essentially arrays of pixels, and inferring structure, semantics, or dynamics from them is non-trivial. Recent advances in machine learning—particularly deep learning—have transformed the field by enabling algorithms that can effectively “learn to see” from large collections of example images and videos. To reflect this, the course will also introduce modern machine learning methods that have become central to contemporary computer vision research and applications.

This course blends short lectures with interactive paper readings and discussions. The lectures will give you the background you need to dive into research papers, understand their methods, and try out key ideas yourself. In most classes, the first 45 minutes will focus on a core topic, and the rest of the time will be spent presenting papers and exploring them together in discussion.

We’ll cover topics such as:

how images are formed and how cameras model the world;
motion and optical flow;
depth perception and 3D analysis;
recognizing actions in video;
convolution and filtering;
regression and classification;
clustering and dimensionality reduction; and
neural networks and deep learning

These topics will give you the tools and intuition to make sense of the papers we read and to actively engage with the exciting questions in computer vision research.

Pre-requisites

The course assumes that students are comfortable with statistics, basic linear algebra, and programming.

We will be using Python for the programming part of this course. For Python, I recommend the Anaconda distribution, which comes pre-loaded for nearly all the packages that we will be using in this course. Of course you are welcome to use any variant/distribution of Python that suits you.

The course also assumes that students are willing to read and comprehend large volumes of technical papers. Furthermore, that students have some experience with technical report writing.

Grading

Course project, 40% (A student needs to get 60% marks in the project to successfully complete the course.)
- Proposal
- Progress report
- Presentations
- Technical report
Participation and interactions, 10%
- Discussion, readings, QA, class exercises
Paper presentation and leading discussion, 15%
Midterm, 35% (A student needs to get 40% marks in the exam to successfully complete the course.)

Important dates

Midterm
- Week 10, November 13, closed book in class
Project
- proposal: Oct. 10, midnight
- progress report: Oct. 31, midnight
- final report: Dec. 8, midnight
- project presentation: last two weeks of classes
Presentations
- Through out the term

Ontario Tech University’s academic calendar that lists important dates (and deadlines) is available at here.

Course calendar

Week 1 - Introduction, image formation, stereo
- Image formation and pinhole camera model
  Everything up to “Modeling radial distortion to capture (and undo) lens effects” is included in the midterm.
Week 2 - Linear regression
- ML introduction
- Linear regression
  - Derivation
Week 3 - Logistic regression and multi-class classification
Week 4 - Neural networks
- Neural networks
Week 5 and 6 - Convolutional neural networks
- Linear layers
- CNN overview
Week 7 - Images as functions
Week 8 - Paper Presentations
Week 9 - Paper Presentations
Week 10 - Midterm
Week 11 - Paper Presentations
Week 12 - Project Presentations and other stuff

The list of assigned papers will be available after the first week of classes. Please check the course website for details.

Midterm prepration

The midterm will cover the topics discussed during the lectures. Additionally, you may find the following notes useful.

Computer Vision Papers

Find a collection of computer vision papers at https://github.com/jbhuang0604/awesome-computer-vision. The paper is organized in topics. Please find papers in topics that interest you. Please find at least five papers in two different areas. At least three of the five papers should be recent.

See course canvas for more instructions.

Course Work

Midterm

Midterm will take place in class.
The midterm will be closed-book.
A student must receive at least 40% in the midterm pass the course.

Presentation

Each student will be assigned recent papers to read and present. The student will be responsible for leading the discussion for this paper. Each student may be assigned to present multiple papers.

We will follow role playing paper-reading seminar format. This means that each of us will be expected to read the paper and assume a role. The roles will shift from paper to paper. Check out advice on how to read a paper by S. Keshav.

Instructions for the presenter

Duration, 30 minutes
Create a slideshow
- Easy to read
- Avoid verbosity
- Use figures, examples
- Clear and easily understandable structure
- Practice your talk before the lecture!
Key questions
- What does the paper do?
- What are its limitations?
- What are its strengths?
- Is this paper reproducible?
- How does paper support its key arguments?
- What software does paper uses?
- What datasets does paper uses?
- How does this paper fit with the larger body of literature?

Instructions for the participants

Read the paper before the lecture
Be prepared to answer questions
Be prepared to participate in the discussion
Provide feedback to the presenter
- Compliments, suggestions, criticism, thanks

Project

The course project is an independent exploration of a specific problem within the context of this course. A project can be implementation oriented—where a student implements a computer vision system—or application oriented—where a student attempts to solve a problem (of suitable difficulty) by applying machine learning techniques. The project topic will be selected in consultation with the instructor.

Project grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughtful are your conclusions.

Course project is typically an individual effort.

Project topics

Projects must be related to computer vision theory, methods, and systems. A project that simply uses a pre-trained deep learning model, say YOLO or ImageNet to solve some larger “task” is not appropriate. Such a project simply applies a pre-built system to the task at hand. I want us to have an opportunity to implement computer vision systems that underpin all these different applications.

Possible topics are:

Estimating 3D locations from multiple cameras or from a single moving camera;
Analyzing human poses and actions;
Constructing features for image matching;
Enhancing, completing and colorizing images and videos;
Estimating optical flow;
Analyzing sports videos;
Recognizing humans, animals, birds and flowers;
Analyzing traffic images and videos;
Recognizing human facial expressions;
etc.

In many cases it is difficult to deal with real cameras and hardware. In these situations it is possible to implement and evaluate your algorithms using simulated data. E.g., you can use a game engine to simulate traffic images captured at a road intersection.

Project proposal

one page (12 pt)
clear and concise problem statement
discuss its relevance
why is it an interesting problem to solve (level of difficulty)
describe other related approaches
sketch your approach
list anticipated difficulties

Progress Report

one page (12 pt)
describe the problem you are working on (this should include any feedback that you’ve received on your project proposal)
describe your approach in more detail
summarize your accomplishments to date
list next steps
list any problems that you encountered, and how you solved otherwise
identify any problems that you expect to encounter

Final in-class Presentation

15 minutes
the problem description with a motivation
a quick overview of related work
the proposed solution
a technical description of the solution
encountered difficulties
an evaluation
future work and conclusion

Final Report

For your final project write-up you must use ACM SIG Proceedings Template (available at the ACM website). Project report is at most 12 pages long, plus extra pages for references. Your report must of “publishable quality,” i.e., no typos, grammar error.

The final deadline for project report submission is 11th of December, midnight EST. This is a firm deadline. You will incur a penalty of 40% if you do not meet this deadline. These strict rules mimic conference submission process:

a predefined format;
limited amount of space to explain your ideas and contribution; and
firm submission deadline.

Reading material

You will find the following computer vision books useful.

Computer Vision: Algorithms and Applications by Richard Szleski
Computer Vision: Models, Learning, and Inference by Simon J.D. Prince.

Following books are good resources for machine learning, especially deep learning

Neural Networks and Deep Learning: A Textbook by Charu C. Aggarwal.
Deep Learning by Ian Goodfellow and Yoshua Bengio.
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy and Francis Bach.
Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
Pattern Recognition and Machine Learning by Christopher M. Bishop.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie and Robert Tibshirani.

These resources will not only help you understand the assigned papers. These resources may prove invaluable for your course projects.

Programming Resources

Here you’ll find a number of tutorials showcasing Python use in machine learning. I strongly recommend that you become comfortable with the following four Python packages/environment:

News

Course Info

Lectures

Communication

Office hours

Syllabus

Canvas (requires login)

Course notes