Authored by Tony Feng
Created on Feb 10nd, 2022
Last Modified on Feb 14nd, 2022
Intro
This sereis of posts contains a summary of materials and readings from the course CSCI 1430 Computer Vision that I’ve taken @ Brown University. This course covers the topics of fundamentals of image formation, camera imaging geometry, feature detection and matching, stereo, motion estimation and tracking, image classification, scene understanding, and deep learning with neural networks. I posted these “Notes” (what I’ve learnt) for study and review only.
Overview
Edges and corners are typically called features that are unique to some part of the image. Local features describe one component of the image, not the entirely image.
Corners are also called interest points or key points.
Corners are important, because we can get spare correspondences, which allows for fast matching between images.
Finding distinctive and repeatable feature points can be difficult when we want our features to be invariant to large transformations:
- geometric variation (translation, rotation, scale, perspective)
- appearance variation (reflectance, illumination)
Example: Panorama Stitching
Using panorama requires correspondences. We have two images of a scene, and we can see that there is some overlay, but the images are not quite the same.
We need our features/corners to have some sort of properties in this case: 1) distinctiveness, 2) repeatability, 3) compactness, 4) efficiency.
Local Feature Matching
- Detection
- Find a set of distinctive features (e.g. key points, edges)
- Description:
- Extract feature descriptor around each interest point as vector.
- Matching:
- Compute distance between feature vectors to find correspondence.
Corner Detection
Basic idea
We do not know which other image locations the feature will end up being matched against. But we can compute how stable a location is in appearance with respect to small variations in its position. In other words, we can compare image patch against local neighbors.
We want a window shift in any direction to give a large change in intensity.
Auto-correlation
$$E(u, v)=\sum_{x, y} w(x, y)[I(x+u, y+v)-I(x, y)]^{2}$$
, where $w(x, y)$ is window function, $I(x+u, y+v)$ is shifted intensity, $I(x, y)$ is intensity. If our “red box” goes outside the image, then we would need to pad the image.
However, this approach is computationally expensive. It needs $O(W^2+S^2+I^2)$, where $W$, $S$, $I$ stand for window size, shift range and image size, respectively.
Quadratic Surface
We can approximate $E(u,v)$ locally by a quadratic surface, and look for that instead.
Taylor Series Expansion
A function $f$ can be represented by an infinite series of its derivatives at a single point $a$.
$$ f = f(a)+\frac{f^{\prime}(a)}{1 !}(x-a)+\frac{f^{\prime \prime}(a)}{2 !}(x-a)^{2}+\frac{f^{\prime \prime \prime}(a)}{3 !}(x-a)^{3}+\cdots $$
Local Quadratic Approximation
Local quadratic approximation of $E(u,v)$ in the neighborhood of $(0,0)$ is given by the second-order Taylor expansion.
This could be simplified as the following equation, where $M$ is a structure tensor computed from image derivatives.
Interpreting the Second Moment Matrix
- $M$ is symmetric around the diagonal
- Symmetric matrices have orthogonal eigenvectors
- $M$ is square
- Square matrices are diagonalizable if some matrix $R$ exists such that $M = R^{-1}AR$ where $A$ has only diagonal entries and $R$ represents a change of basis (in 2D, a rotation).
We want to decompose $M$ to be able to analyze the scale of these quadratic surfaces while ignoring the rotation. This means we care that it is a corner or an edge, not which rotation of edge.
Or we can combine $\lambda _1$, $\lambda _2$ with some constant $\alpha$ to computer the cornerness score:
$$ C={\lambda _1}{\lambda _2}-{\alpha}({\lambda _1}+{\lambda _2})^{2} $$
In linear algebra, for diagonal matrices,
- Determinant of a diagonal matrix is also the product of its diagonal entries
- Trace is just the sum of its diagonal entries
Therefore, we can avoid explicit eigenvalue computation by $$C=\operatorname{det}(A)-\alpha \operatorname{trace}(A)^{2}$$
Since $S^{-1}$ is the inverse matrix of $S$, we have $S^{-1}\cdot S = I$
$$\operatorname{det}(I) = \operatorname{det}(S\cdot S^{-1}) = \operatorname{det}(S)\cdot \operatorname{det}(S) = 1$$
We can now obtain that $\operatorname{det}(M) = \operatorname{det}(A)$
Thus,
$$C=\operatorname{det}(M)-\alpha \operatorname{trace}(M)^{2}$$
Harris Corner Detector
- Input image: computer $M$ at each pixel level
- Compute image derivatives $I_{x}$, $I_{y}$
- Compute $M$ components as squares of derivatives $I_{x}^{2}$, $I_{y}^{2}$
- Gaussian filter $g$ with width $s$: $g\left(I_{x}^{2}\right)$, $g\left(I_{y}^{2}\right)$, $g\left(I_{x} \circ I_{y}\right)$
- Compute cornerness $C=g\left(I_{x}^{2}\right) \circ g\left(I_{y}^{2}\right)-g\left(I_{x} \circ I_{y}\right)^{2} -\alpha\left[g\left(I_{x}^{2}\right)+g\left(I_{y}^{2}\right)\right]^{2}$
- Threshold $C$ to pick high cornerness
- Non-maximal suppression to pick peaks
Note: $a \circ b$ is Hadamard product (element-wise multiplication).
Scale Invariant Detection
Invariance & Covariance
Invariance: image is transformed and corner locations do not change.
Covariance / Equivariance: if we have two transformed versions of the same image, features should be detected in corresponding locations.
Properties of Harris Detector
- Corner response is invariant to image rotation
- Ellipse rotates but its shape (i.e. eigenvalues) remains the same
- Partial invariance to affine intensity change
- Invariance to intensity shift $I \rightarrow I + b$
- Still affected by intensity scaling $I \rightarrow aI$
- Non-invariant to image scale
Automatic Scale Selection
For a point in one image, we can consider it as a function of region size
For both image, we want to chose a scale that corresponds to the peak in both images. It’s likely that the peaks is the best scale for finding the correpsondences.