[Computer Vision] Stereo Matching

Authored by Tony Feng

Created on Mar 7th, 2022

Last Modified on Mar 7th, 2022

Intro

This sereis of posts contains a summary of materials and readings from the course CSCI 1430 Computer Vision that I’ve taken @ Brown University. This course covers the topics of fundamentals of image formation, camera imaging geometry, feature detection and matching, stereo, motion estimation and tracking, image classification, scene understanding, and deep learning with neural networks. I posted these “Notes” (what I’ve learnt) for study and review only.

Stereo Pipeline

Calibrating cameras
Rectifying images (i.e. 8-point algorithm)
Correspondence
Estimate depth

Correspondence here means dense correspondence, i.e. for each point on the left image we find its correspondence on the right.

Correspondence allows measurement of disparity: the difference in the image coordinates of the projections of a given world point into each camera.

Basic Stereo Matching

Algorithm

Rectify the two stereo images to transform epipolar lines into scanlines
For each pixel x in the first image
- Find the corresponding epipolar scanline in the right image
- Examine all pixels on the scanline and pick the best match $x'$
- Compute disparity $x - x’$ and set depth $ Z = f \frac{T}{x - x’}$

Effect of Window Size

When calculating SSD or Normalised Correlation of an image window is chosen around the point:

Smaller window: more detail but more noise
Larger window: Smoother disparity maps but less detail

Problems

Window size is fixed across the image, but viewed objects differ in size and depth.
Uniform regions always match.
Values on dense disparity map are only reliable where there is some local variation in intensity e.g. near edges.
Dense disparity is computationally expensive in spatial domain.

Stereo Constraints

So far, matches are independent for each point. What constraints or priors can we add?

Uniqueness

For any point in one image, there should be at most one matching point in the other image

Ordering

Corresponding points should be in the same order in both views for most cases.

Ordering constraint doesn’t hold when occlusion occurs.

Smoothness

We expect disparity values to change slowly (for the most part).

Disparity Space Image

Idea

DSI for one row represents pairwise match scores between patches along that row in the left and right image.

DSI Formation

Goal

Assigning disparities to all pixels in the left scanline now amount to finding a connected path through DSI. We need to find the minimum cost path through the matrix of all pairwise matches between two corresponding rasters.

Correspondence Search

As we traverse the scanline there are 3 possibilities

Pixels match, at a cost based on similarity
Left occlusion, at a cost associated with an unmatched pixel
Right occlusion, at a cost associated with an unmatched pixel

Assuming that row, column of DSI represents right and left image respectively.

$$C(i, j) = \text{min}(C(i-1, j-1) + D(i, j), C(i-1, j) + OC, C(i, j-1) + OC) $$

, where $C$ means cost, $D$ means dissimilarity, $OC$ means occlusioin constant.

Performance

Strengths

Produces good results in polynomial time
Can deal with occlusions

Weaknesses

Can be hard to find the right cost function
Hard to enforce consistency between neighbouring rasters along vertical direction.
Must enforce the ordering constraint