Authored by Tony Feng
Created on Mar 9th, 2022
Last Modified on Mar 9th, 2022
Intro
This sereis of posts contains a summary of materials and readings from the course CSCI 1430 Computer Vision that I’ve taken @ Brown University. This course covers the topics of fundamentals of image formation, camera imaging geometry, feature detection and matching, stereo, motion estimation and tracking, image classification, scene understanding, and deep learning with neural networks. I posted these “Notes” (what I’ve learnt) for study and review only.
History of Recognition
Geometric Data
- 1960s – early 1990s
- Camera Position Illumination
- Recognition as an alignment problem
- e.g. fitting a model to a transformation between feature pairs
- Recognition by components
Appearance-based Models
- 1990s
- Eigenfaces
- Color Histogram
Sliding Window Approaches
- Mid 1990s
- sliding window + image pyramid $\rightarrow$ scale + location
Local Features
Parts-and-shape Models
- Early 2000s
- Model
- Objects as a set of parts
- Relative locations between parts
- Appearance of part
- Constellation Models
- Pictorial Structure Model
Bags of Features
- Mid-2000s
- Origins
- Texture Recognition
- Bag-of-words models
Bags of Features
It works pretty well for image-level classification and for recognizing object instances.
Steps
- Feature extraction
- Regular Grids
- Interest Regions
- …
- Form a “visual vocabulary”
- Quantize features using visual vocabulary
- Learn the visual vocabulary
Issues
- How to choose the size of the visual vocbulary?
- Too small: features are not representative
- Too large: overfitting
- Computational efficiency
Spatial Pyramid Matching
Color Histogram
All of these images have the same color histogram. How can we encode the spatial layout?
Pyramids
- Pyramid is built by using multiple copies of image.
- Each level in the pyramid is $\frac{1}{4}$ of the size of previous level.
- The lowest level is of the highest resolution.
- The highest level is of the lowest resolution.