How to estimate the extrinsic matrix of a chessboard image and project it to bird's eye view such it presents pixel size in meters?

Question:

I want to generate an Occupancy Grid (OG) like image with a Bird’s Eye View (BEV), i.e., each image pixel has a constant unit measure and everything on the final grid is floor (height=0).

I don’t know what I’m missing, I’m newbie on the subject and I’m trying to follow a pragmatic step by step to get on the final results. I have spent a huge time on this and I’m still getting poor results. I’d appretiate any help. Thanks.

To get on my desired results, I follow the pipeline:

  1. Estimate the extrinsic matrix with cv2.solvePnP and a chessboard image.
  2. Generate the OG grid XYZ world coordinates (X=right, Y=height, Z=forward).
  3. Project the OG grid XYZ camera coordinates with the extrinsic matrix.
  4. Match the uv image coordinates for the OG grid camera coordinates.
  5. Populate the OG image with the uv pixels.

I have the following intrinsic and distortion matrices that I previously estimated from another 10 chessboard images like the one bellow:

1. Estimate the extrinsic matrix

import numpy as np
import cv2
import matplotlib.pyplot as plt


mtx = np.array([[2029,    0, 2029],
                [   0, 1904, 1485],
                [   0,    0,    1]]).astype(float)

dist = np.array([[-0.01564965,  0.03250585,  0.00142366,  0.00429703, -0.01636045]])

enter image description here

impath = '....'
img = cv2.imread(impath)

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
CHECKERBOARD = (5, 8)
ret, corners = cv2.findChessboardCorners(gray, CHECKERBOARD, None)
corners = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)

objp = np.concatenate(
            np.meshgrid(np.arange(-4, 4, 1),
                        0,
                        np.arange(0, 5, 1), 
                        )
        ).astype(float)

objp = np.moveaxis(objp, 0, 2).reshape(-1, 3)

square_size = 0.029
objp *= square_size

ret, rvec, tvec = cv2.solvePnP(objp, corners[::-1], mtx, dist)
print('rvec:', rvec.T)
print('tvec:', tvec.T)

# img_withaxes = cv2.drawFrameAxes(img.copy(), mtx, dist, rvec, tvec, square_size, 3)
# plt.imshow(cv2.resize(img_withaxes[..., ::-1], (800, 600)))


# rvec: [[ 0.15550242 -0.03452503 -0.028686  ]]
# tvec: [[0.03587237 0.44082329 0.62490573]]
R = cv2.Rodrigues(rvec)[0]
RT = np.eye(4)
RT[:3, :3] = R
RT[:3, 3] = tvec.ravel()
RT.round(2)

# array([[-1.  ,  0.03,  0.04,  0.01],
#        [ 0.03,  0.99,  0.15, -0.44],
#        [-0.03,  0.16, -0.99,  0.62],
#        [ 0.  ,  0.  ,  0.  ,  1.  ]])

2. Generate the OG grid XYZ world coordinates (X=right, Y=height, Z=forward).

uv_dims = img.shape[:2] # h, w
grid_dims = (500, 500) # h, w

og_grid = np.concatenate(
                np.meshgrid(
                    np.arange(- grid_dims[0] // 2, (grid_dims[0] + 1) // 2, 1),
                    0, # I want only the floor information, such that height = 0
                    np.arange(grid_dims[1]),
                    1
                    )
                )
og_grid = np.moveaxis(og_grid, 0, 2)

edge_size = .1
og_grid_3dcoords = og_grid * edge_size
print(og_grid_3dcoords.shape)

# (500, 500, 4, 1)

3. Project the OG grid XYZ camera coordinates with the extrinsic matrix.

og_grid_camcoords = (RT @ og_grid_3dcoords.reshape(-1, 4).T)
og_grid_camcoords = og_grid_camcoords.T.reshape(grid_dims + (4,))
og_grid_camcoords /= og_grid_camcoords[..., [2]]
og_grid_camcoords = og_grid_camcoords[..., :3]

# Print for debugging issues
for i in range(og_grid_camcoords.shape[-1]):
    print(np.quantile(og_grid_camcoords[..., i].clip(-10, 10), np.linspace(0, 1, 11)).round(1))

# [-10.   -1.3  -0.7  -0.4  -0.2  -0.    0.2   0.4   0.6   1.2  10. ]
# [-10.   -0.2  -0.2  -0.2  -0.2  -0.2  -0.1  -0.1  -0.1  -0.1  10. ]
# [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

4. Match the uv image coordinates for the OG grid coordinates.

og_grid_uvcoords = (mtx @ og_grid_camcoords.reshape(-1, 3).T)
og_grid_uvcoords = og_grid_uvcoords.T.reshape(grid_dims + (3,))
og_grid_uvcoords = og_grid_uvcoords.clip(0, max(uv_dims)).round().astype(int)
og_grid_uvcoords = og_grid_uvcoords[..., :2]

# Print for debugging issues
for i in range(og_grid_uvcoords.shape[-1]):
    print(np.quantile(og_grid_uvcoords[..., i], np.linspace(0, 1, 11)).round(1))

# [   0.    0.  665. 1134. 1553. 1966. 2374. 2777. 3232. 4000. 4000.]
# [   0. 1134. 1161. 1171. 1181. 1191. 1201. 1212. 1225. 1262. 4000.]

Clip to uv values to the image boundaries.

mask_clip_height = (og_grid_uvcoords[..., 1] >= uv_dims[0])
og_grid_uvcoords[mask_clip_height, 1] = uv_dims[0] - 1

mask_clip_width = (og_grid_uvcoords[..., 0] >= uv_dims[1])
og_grid_uvcoords[mask_clip_width, 0] = uv_dims[1] - 1

5. Populate the OG image with the uv pixels.

og = np.zeros(grid_dims + (3,)).astype(int)

for i, (u, v) in enumerate(og_grid_uvcoords.reshape(-1, 2)):
    og[i % grid_dims[1], i // grid_dims[1], :] = img[v, u]

plt.imshow(og)

enter image description here

I was expecting a top-down view of the test image.

Asked By: Rafael Toledo

||

Answers:

In the end, It turned out that I made a mistake which the "homogenous point
1" of world homogenous coordinates was also been scaled by the edge_size in part "2" of the pipeline. Fixing this and rearranging the mesh order of the z-axis in the OG-grid yielded the BEV of the image that I expected.

The fixed snippet:

uv_dims = img.shape[:2] # h, w
grid_dims = (500, 500) # h, w

og_grid = np.concatenate(
                np.meshgrid(
                    np.arange(- grid_dims[0] // 2, (grid_dims[0] + 1) // 2, 1),
                    0, # I want only the floor information, such that height = 0
                    np.arange(grid_dims[1] - 1, -1, -1),
                    1
                    )
                )
og_grid = np.moveaxis(og_grid, 0, 2)

edge_size = .1
og_grid_3dcoords = og_grid * edge_size
og_grid_3dcoords[:, :, 3, :] = 1
print(og_grid_3dcoords.shape)
# (500, 500, 4, 1)

The final outcome:

enter image description here

Answered By: Rafael Toledo