Understanding the view and projection matrix from pybullet

Question:

When working with rendering images in Pybullet, one has to use getCameraImage which takes as inputs a view and projection matrices (pybullet also have functions to generate those matrices). In theory, the projection matrix should be P = K[R|t], it can be re-written as P = [M|-MC] so we could use in theory RQ decomposition with M where R is an upper triangular matrix. So we could recover K and [R|t] from the projection matrix (having in mind that the R from RQ decomposition is not the R from R|t). But when I use for example scipy.linalg.rq the result is not a valid K (intrinsic) matrix.

Can someone explain how is the projection matrix exactly defined and what is the view matrix in pybullet? and how we can retrieve the intrinsic and extrinsic parameters using those matrices?

Asked By: Mike W

||

Answers:

So pybullet usually constructs the projection matrix (source code) using the field of view (FOV in rads) as

enter image description here

and the intrinsic matrix is defined as

enter image description here

p_x and p_y are the principal points, usually the centre of the image. So there are a few differences:

  1. The dimensions. Pybullet adds the third row (not the forth) and the fourth column to keep depth information.
  2. Ignoring the third row the element 2,2 (zero-index) is not 1.
  3. Pybullet is using a 0 skewed parameter.
  4. It is not using the focal length (it is but it is calculating it from the FOV).
  5. Pybullet is assuming p_x = p_y = 0

First, pybullet uses the notation of OpenGL so it is using a major-column order (read more). Meaning the first element while indexing is the column and not the row. Therefore the actual projection matrix from pybullet should be transposed.

Second, the full equation to convert FOV to focal length f is:

enter image description here

therefore pybullet is multiplying the focal length by 2/h. The reason why is because pybullet uses Normalized Device Coordinates (NDC) which clips the value between [-1,1] (dividing x by width clips it to [0,1] and multiplying it by 2 clips it between [0,2] which if the principal point is in the middle point of the image 1,1 then it is clipped to [-1,1]). Therefore pybullet’s focal length is the proper focal length using NDC.

The non-zero values in the third column of the projection matrix are there to map z values in OpenGL so we can ignore them.

k, l in the K matrix is the ratio mm/px, if we are using pybullet we can say k=l=1.

Some useful resource are [1], [2] and [3].

Answered By: Mike W

pybullet got two functions to construct the projection matrix:

def computeProjectionMatrixFOV(fov, aspect, near, far) 

def computeProjectionMatrix(left, right, bottom, top, near, far)

Converting from FOV params to the other group is done as follows [1]:

top    = tan(fov/2)*near
bottom = -top
right  = top*aspect
left   = -right

Aspect ratio is defined as: aspect = width/height

Test in pybullet:

import pybullet as p

fov = 60
width = 320
height = 240
aspect = width / height
near = 0.2
far = 10

projectionMatrix = p.computeProjectionMatrixFOV(fov, aspect, near, far)
print(projectionMatrix)

top = np.tan(np.deg2rad(fov)/2)*near
bottom = -top
right = top * aspect
left = -right

projectionMatrix = p.computeProjectionMatrix(left, right, bottom, top, near, far)
print(projectionMatrix)

with the outputs:

(1.299038052558899, 0.0, 0.0, 0.0, 0.0, 1.7320507764816284, 0.0, 0.0, 0.0, 0.0, -1.040816307067871, -1.0, 0.0, 0.0, -0.40816324949264526, 0.0)
(1.299038052558899, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.040816307067871, -1.0, 0.0, 0.0, -0.40816324949264526, 0.0)
Answered By: Songyan XIN

I searched extensively to find a compact answer to constructing the view and projection matrices using calibrated K and ROS TF extrinsic poses but to my amusement, I found none.

I wrote and tested the following two functions which computes the matrices required for simulating real cameras in pybullet. Hopefully it will be useful:

from pyquaternion import Quaternion
import numpy as np


def cvK2BulletP(K, w, h, near, far):
    """
    cvKtoPulletP converst the K interinsic matrix as calibrated using Opencv
    and ROS to the projection matrix used in openGL and Pybullet.

    :param K:  OpenCV 3x3 camera intrinsic matrix
    :param w:  Image width
    :param h:  Image height
    :near:     The nearest objects to be included in the render
    :far:      The furthest objects to be included in the render
    :return:   4x4 projection matrix as used in openGL and pybullet
    """ 
    f_x = K[0,0]
    f_y = K[1,1]
    c_x = K[0,2]
    c_y = K[1,2]
    A = (near + far)/(near - far)
    B = 2 * near * far / (near - far)

    projection_matrix = [
                        [2/w * f_x,  0,          (w - 2*c_x)/w,  0],
                        [0,          2/h * f_y,  (2*c_y - h)/h,  0],
                        [0,          0,          A,              B],
                        [0,          0,          -1,             0]]
    #The transpose is needed for respecting the array structure of the OpenGL
    return np.array(projection_matrix).T.reshape(16).tolist()


def cvPose2BulletView(q, t):
    """
    cvPose2BulletView gets orientation and position as used 
    in ROS-TF and opencv and coverts it to the view matrix used 
    in openGL and pyBullet.
    
    :param q: ROS orientation expressed as quaternion [qx, qy, qz, qw] 
    :param t: ROS postion expressed as [tx, ty, tz]
    :return:  4x4 view matrix as used in pybullet and openGL
    
    """
    q = Quaternion([q[3], q[0], q[1], q[2]])
    R = q.rotation_matrix

    T = np.vstack([np.hstack([R, np.array(t).reshape(3,1)]),
                              np.array([0, 0, 0, 1])])
    # Convert opencv convention to python convention
    # By a 180 degrees rotation along X
    Tc = np.array([[1,   0,    0,  0],
                   [0,  -1,    0,  0],
                   [0,   0,   -1,  0],
                   [0,   0,    0,  1]]).reshape(4,4)
    
    # pybullet pse is the inverse of the pose from the ROS-TF
    [email protected](T)
    # The transpose is needed for respecting the array structure of the OpenGL
    viewMatrix = T.T.reshape(16)
    return viewMatrix

The above two functions give you the required matrices to get images from the pybullet environment like this:

projectionMatrix = cvK2BulletP(K, w, h, near, far)
viewMatrix = cvPose2BulletView(q, t)

_, _, rgb, depth, segmentation = b.getCameraImage(W, H, viewMatrix, projectionMatrix, shadow = True)

Above returns images without distortion. For this, you could use the answer provided earlier.

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.