An Introduction to Computer Vision

*
Accepted Session
Short form
Scheduled: Thursday, June 18, 2009 from 3:50 – 4:35pm in Morrison

Excerpt

Learn about several computer vision techniques and how to put them together to form an entry-level object classifier.

Description

Computer vision has started to achieve some very impressive results over the last 5-10 years. It is now possible to quickly and reliably detect faces, recognize and localize target images, and even classify pictures of objects into generic categories. Unfortunately, knowledge of these techniques remains largely confined to academia. In this session we’ll go over some of the tools available, placing an emphasis on exploring the ideas and algorithms behind their design.

To show how these components can be put together, a sample system will be developed over the course of the presentation. Starting with standard image descriptors, we’ll first see how to do direct image recognition. We’ll then extend that into a simple object classifier, which will be able to distinguish (for example) between images which contain a bicycle and those that don’t.

Topics covered will include:

  • Image descriptors (Scale-Invariant Feature Transform)
  • Bag-of-words classification
  • Machine learning (Support Vector Machines, boosting)
  • Training datasets
  • Viola-Jones face detection (if time allows)

Python code for the sample SIFT-matching and bag-of-words classifier applications can be found here .
The slides can be found here and in PDF format here .

Speaking experience

Speaker

  • 1155925004 pv

    Matthew Dockrey

    University of British Columbia

    Biography

    Matthew is a graduate student working in computer vision and robotics at UBC. He was part of the winning team in the 2008 Semantic Robot Vision Challenge, where robots autonomously learn how to recognize objects using only internet image searches and then explore an environment to find them. Currently his research is focused on the semantic segmentation of video frames using the motion of 3D pointclouds derived from stereo camera data.

    Sessions