Camera Image Perspective Transformation to different plane using OpenCV

Objects In Mirror Are Closer Than They Appear

Image Taken from: https://btgconsulting.biz/2019/12/10/warning-dates-on-the-calendar-are-closer-than-they-appear/

“Objects In Mirror Are Closer Than They Appear”. I am sure, most of you, would have seen this disclaimer. Ever Wondered Why? Read along to find out!

Due to several advancements in camera technology, cameras are capable of clicking shots at various angles and views. Different angles for instance high-angle by drones, Bird’s Eye, side view, etc., or different camera lenses such as fisheye lens, project the scene on the camera image plane differently than the actual position of the object in the world. This phenomenon is termed as Perspective distortion in photography.

The disclaimer “Objects In Mirror Are Closer Than They Appear”, is always on the sideview mirror which is mostly tilted and slightly curved. Due to this curvature, Perspective distortion takes place in the image we see in the mirror.

Formally, perspective distortion is a warping or transformation of an object and its surrounding area that differs significantly from what the object would look like with normal focal length, due to the relative scale of nearby and distant features. In this post, I will cover how we can correct these distortions to find out the actual position of the object in the world, given the 2D camera image of it(under a specific assumptions).

First, we have to understand, how the 3d object in the world, is projected onto the image plane in the camera.

Fundamentally, through the camera pinhole, light is cast on the object. This light when reflected from the object, is projected on the image plane which is present inside the camera and gives the inverted image. To get the actual view, this image plane is inverted(virtual image plane in image above).

Without going into mathematical details, the object’s coordinates in the 3d world, are converted into a 2d image plane using trigonometry(shown below, read more here https://lhoangan.github.io/camera-params/).

Mathematically, the 3d coordinates are multipled by a 3x4 matrix(called the perspective matrix) to get the 2d coordinates in the image plane.

Point (x, y, z) in the 3d world coordinate system is converted to point (u, v) on 2d camera image plane using the 3x4 perspective matrix.

But in this post, we would like to find a matrix to go from the 2d image plane to world coordinates. Intuitively, it is not possible, because in the 2d image plane, we lose the 3rd dimension and it is not possible to recover the 3rd dimension without any extra knowledge(that’s why RGB-D camera exits). Also mathematically, the 3x4 projection matrix is not square and hence not invertible.

But if we are only interested in the (x, y) world coordinates of the object for a fixed z, it is possible to find those. For example, if we have a security camera at the top of the lift and we would like to know the actual position of the person on the floor(considering fixed z=0 as floor plane) from the camera image, it would be possible. There are two approaches to do that.

  1. Using the inverse Perspective Matrix

If the plane at which (x, y) world coordinates are desired, is at z=0 of the system, then any point on the floor plane will have coordinates <x, y, 0> and this effectively nullifies the effect of the third column of the perspective matrix from eq 1. Now the perspective matrix P could be reduced to a 3x3 matrix and hence becomes invertible(shown below).

Now to get the (x, y) from (u, v), we only need to find the inverse of 3x3 perspective matrix P and we can find the corresponding point on floor plane from image plane.

But what if we want to find the (x, y) coordinate on any arbitrary plane where z is not equal to 0. We could use Homography to achieve that.

2. Using Homography Matrix

Homography Matrix relates the transformation from one plane to another. Essentially, if we have <u, v> point in one plane and <x, y> in another plane, then the transformation from <u, v> to <x, y> could be written as

where H is the 3x3 homography matrix.

But to get the homography matrix, we must need at least 4 points in one plane and their corresponding mapping points in the other plane.

Given the perspective matrix P, we can easily find the 4 corresponding points between any 3d world plane(keeping fixed z) and 2d plane using eq 1.

In OpenCV, you could use cv2.findHomography function to find the homography matrix, by giving 4 or more points in the source and destination plane.

Below is the code gist using OpenCV.

Once you get the homography matrix, eq 2 could be used to move from the image plane to the desired plane.

A point of caution is, these methods don’t imply anything about the position along the z-axis in world coordinate and considers it fixed.

TL; DR:

  • For Perspective transformation from the Image plane to a fixed plane in world coordinates, two methods could be used.
  • If the equation of the plane is z=0, the 3x4 perspective matrix could be reduced to a 3x3 matrix by ignoring the 3rd column entirely and the inverse of this 3x3 matrix could be used.
  • For any other arbitrary plane, the homography matrix should be used. To find the homography matrix, at least 4 corresponding points in both planes are required by cv.findhomography method.

Intrigued about Deep learning and all things ML.