[Computer Vision] Visual Bag of Words

Authored by Tony Feng

Created on Mar 9th, 2022

Last Modified on Mar 9th, 2022

Intro

This sereis of posts contains a summary of materials and readings from the course CSCI 1430 Computer Vision that I’ve taken @ Brown University. This course covers the topics of fundamentals of image formation, camera imaging geometry, feature detection and matching, stereo, motion estimation and tracking, image classification, scene understanding, and deep learning with neural networks. I posted these “Notes” (what I’ve learnt) for study and review only.


History of Recognition

Geometric Data

Appearance-based Models

Sliding Window Approaches

  • Mid 1990s
  • sliding window + image pyramid $\rightarrow$ scale + location

Local Features

Parts-and-shape Models

Bags of Features

  • Mid-2000s
  • Origins
    • Texture Recognition
    • Bag-of-words models

Bags of Features

It works pretty well for image-level classification and for recognizing object instances.

Steps

  • Feature extraction
    • Regular Grids
    • Interest Regions

  • Form a “visual vocabulary”

  • Quantize features using visual vocabulary

  • Learn the visual vocabulary

Issues

  • How to choose the size of the visual vocbulary?
    • Too small: features are not representative
    • Too large: overfitting
  • Computational efficiency

Spatial Pyramid Matching

Color Histogram

All of these images have the same color histogram. How can we encode the spatial layout?

Pyramids

  • Pyramid is built by using multiple copies of image.
  • Each level in the pyramid is $\frac{1}{4}$ of the size of previous level.
  • The lowest level is of the highest resolution.
  • The highest level is of the lowest resolution.



MIT License
Last updated on Mar 09, 2023 20:08 EST
Built with Hugo
Theme Stack designed by Jimmy