[Computer Vision] Feature Description & Matching

Authored by Tony Feng

Created on Feb 14th, 2022

Last Modified on Feb 14th, 2022

Intro

This sereis of posts contains a summary of materials and readings from the course CSCI 1430 Computer Vision that I’ve taken @ Brown University. This course covers the topics of fundamentals of image formation, camera imaging geometry, feature detection and matching, stereo, motion estimation and tracking, image classification, scene understanding, and deep learning with neural networks. I posted these “Notes” (what I’ve learnt) for study and review only.

Local Feature Description

Before we’re able to start matching points in images, we must first find a way to meaningfully describe each key point. Harris Corners only tell us where the key points are and not what information exists at those points

Possible Feature Descriptors

Templates
- Intensity, gradients, or other self-defined features
Histograms
- Color, texture, SIFT descriptors, etc.

Histograms

A histogram is a plot where the x axis corresponds to pixel values (0-255 for 8-bit encoding), and the y axis measures to the frequency of pixels with values corresponding to the x-axis.

e.g. at an intensity of 50, we see that there are around 5500 pixels in the image with that intensity).

Scale Invariant Feature Transform

SIFT: At a high level, we want to compute a histogram around each key point using local information.

Algorithm

Find Harris corners scale-space extrema as feature point locations
Corner post-processing
- Subpixel position interpolation
- Discard low-contrast points
- Eliminate points along edges
Orientation estimation per feature point
Descriptor extraction

Scale-space Extrema

Find points whose surrounding patches (at some scale) are distinctive
Apply convolution with a Gaussian mask gives some idea of what is going on around a pixel
Gaussian masks have a natural scale - their standard deviation
- Difference of Gaussians

Automatic Scale Selection
Normalize the patch to fix size

Orientation Estimation

Gradient Orientation Histogram

To build this histogram, we bin the possible range of gradient orientations/directions (0 to 2pi radians) into a discrete number of bins (8 here) and, for each bin, we count the number of gradients that have those corresponding orientations.

Dominant Orientation $\theta$

We also compute an average gradient direction (black vector) of all the gradients in the local window; this is the dominant orientation of this neighborhood.

Orientation Normalization

We then rotate each region’s gradients so that all dominant gradient orientations point in the same direction (e.g. up). This will affect the orientation histogram by shifting the plot. We do this because we want the key point features to be invariant to rotation.

Keypoint Descriptor

A 16x16 neighbourhood around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size.

Actually 16x16, only showing 8x8

For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available.

It is represented as a vector to form keypoint descriptor. (16 cells * 8 orientations = 128 dimensional descriptor)

In addition to this, several measures are taken to achieve robustness against illumination changes, etc.

One trick we can employ to reduce the contribution of pixels farther from the keypoint is to apply a Gaussian filter over the entire 16x16 window.
Clamp all vector values > 0.2 to 0.2
Renormalization

Feature Matching

Basic Idea

It generally refers to computing distance between feature vectors to find correspondence.

Euclidean Distance
Cosine Similarity

Problems

Threshold is hard to pick.
Non-distinctive features could have lots of close matches, only one of which is correct.

NN Distance Ratio

Here, we introduce the approach by comparing distances of 2 nearest feature vector neighbor.

If $d_1 \approx d_2$, ratio $\frac{d_1}{d_2}$. (too close)
AS $d_1 \ll d_2$, ratio $\frac{d_1}{d_2}$ tends to 0

Too choose the ratio, we can compute a probability distribution functions of ratios based on keypoints with hand-labeled ground truth.

Other Local Descriptors

SURF
Shape Context
Self-similarity Descriptor