I’m trying to get a depth map with an uncalibrated method.
I can obtain the fundamental matrix by finding correspondent points with SIFT and then using
cv2.findFundamentalMat. I then use
cv2.stereoRectifyUncalibrated to get the homography matrices for each image. Finally I use
cv2.warpPerspective to rectify and compute the disparity, but this doesn’t create a good depth map. The values are very high so I’m wondering if I have to use
warpPerspective or if I have to calculate a rotation matrix from the homography matrices I got with
I’m not sure of the projective matrix with the case of homography matrix obtained with the
stereoRectifyUncalibrated to rectify.
A part of the code:
#Obtainment of the correspondent point with SIFT sift = cv2.SIFT() ###find the keypoints and descriptors with SIFT kp1, des1 = sift.detectAndCompute(dst1,None) kp2, des2 = sift.detectAndCompute(dst2,None) ###FLANN parameters FLANN_INDEX_KDTREE = 0 index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5) search_params = dict(checks=50) flann = cv2.FlannBasedMatcher(index_params,search_params) matches = flann.knnMatch(des1,des2,k=2) good =  pts1 =  pts2 =  ###ratio test as per Lowe's paper for i,(m,n) in enumerate(matches): if m.distance < 0.8*n.distance: good.append(m) pts2.append(kp2[m.trainIdx].pt) pts1.append(kp1[m.queryIdx].pt) pts1 = np.array(pts1) pts2 = np.array(pts2) #Computation of the fundamental matrix F,mask= cv2.findFundamentalMat(pts1,pts2,cv2.FM_LMEDS) # Obtainment of the rectification matrix and use of the warpPerspective to transform them... pts1 = pts1[:,:][mask.ravel()==1] pts2 = pts2[:,:][mask.ravel()==1] pts1 = np.int32(pts1) pts2 = np.int32(pts2) p1fNew = pts1.reshape((pts1.shape * 2, 1)) p2fNew = pts2.reshape((pts2.shape * 2, 1)) retBool ,rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1fNew,p2fNew,F,(2048,2048)) dst11 = cv2.warpPerspective(dst1,rectmat1,(2048,2048)) dst22 = cv2.warpPerspective(dst2,rectmat2,(2048,2048)) #calculation of the disparity stereo = cv2.StereoBM(cv2.STEREO_BM_BASIC_PRESET,ndisparities=16*10, SADWindowSize=9) disp = stereo.compute(dst22.astype(uint8), dst11.astype(uint8)).astype(np.float32) plt.imshow(disp);plt.colorbar();plt.clim(0,400)#;plt.show() plt.savefig("0gauche.png") #plot depth by using disparity focal length `C1[0,0]` from stereo calibration and `T` the distance between cameras plt.imshow(C1[0,0]*T/(disp),cmap='hot');plt.clim(-0,500);plt.colorbar();plt.show()
Here are the rectified pictures with the uncalibrated method (and
Here are the rectified pictures with the calibrated method:
I don’t know how the difference is so important between the two kind of pictures. And for the calibrated method, it doesn’t seem aligned.
The disparity map using the uncalibrated method:
The depths are calculated with :
with T from the
stereoCalibrate. The values are very high.
———— EDIT LATER ————
I tried to "mount" the reconstruction matrix ([Devernay97], [Garcia01]) with the homography matrix obtained with "stereoRectifyUncalibrated", but the result is still not good. Am I doing this correctly?
Y=np.arange(0,2048) X=np.arange(0,2048) (XX_field,YY_field)=np.meshgrid(X,Y) #I mount the X, Y and disparity in a same 3D array stock = np.concatenate((np.expand_dims(XX_field,2),np.expand_dims(YY_field,2)),axis=2) XY_disp = np.concatenate((stock,np.expand_dims(disp,2)),axis=2) XY_disp_reshape = XY_disp.reshape(XY_disp.shape*XY_disp.shape,3) Ts = np.hstack((np.zeros((3,3)),T_0)) #i use only the translations obtained with the rectified calibration...Is it correct? # I establish the projective matrix with the homography matrix P11 = np.dot(rectmat1,C1) P1 = np.vstack((np.hstack((P11,np.zeros((3,1)))),np.zeros((1,4)))) P1[3,3] = 1 # P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1))))) P22 = np.dot(np.dot(rectmat2,C2),Ts) P2 = np.vstack((P22,np.zeros((1,4)))) P2[3,3] = 1 lambda_t = cv2.norm(P1[0,:].T)/cv2.norm(P2[0,:].T) #I define the reconstruction matrix Q = np.zeros((4,4)) Q[0,:] = P1[0,:].T Q[1,:] = P1[1,:].T Q[2,:] = lambda_t*P2[1,:].T - P1[1,:].T Q[3,:] = P1[2,:].T #I do the calculation to get my 3D coordinates test =  for i in range(0,XY_disp_reshape.shape): a = np.dot(inv(Q),np.expand_dims(np.concatenate((XY_disp_reshape[i,:],np.ones((1))),axis=0),axis=1)) test.append(a) test = np.asarray(test) XYZ = test[:,:,0].reshape(XY_disp.shape,XY_disp.shape,4)
There might be several possible issues resulting in low-quality
Depth Channel and
Disparity Channel what leads us to low-quality stereo sequence. Here are 6 of those issues:
As the word
stereoRectifyUncalibrated instance method calculates a rectification transformation for you, in case you don’t know or can’t know intrinsic parameters of your stereo pair and its relative position in the environment.
cv.StereoRectifyUncalibrated(pts1, pts2, fm, imgSize, rhm1, rhm2, thres)
# pts1 –> an array of feature points in a first camera # pts2 –> an array of feature points in a first camera # fm –> input fundamental matrix # imgSize -> size of an image # rhm1 -> output rectification homography matrix for a first image # rhm2 -> output rectification homography matrix for a second image # thres –> optional threshold used to filter out outliers
And your method looks this way:
cv2.StereoRectifyUncalibrated(p1fNew, p2fNew, F, (2048, 2048))
So, you do not take into account three parameters:
thres. If a
threshold > 0, all point pairs that don’t comply with a epipolar geometry are rejected prior to computing the homographies. Otherwise, all points are considered inliers. This formula looks like this:
(pts2[i]^t * fm * pts1[i]) > thres # t –> translation vector between coordinate systems of cameras
Thus, I believe that visual inaccuracies might appear due to an incomplete formula’s calculation.
You can read Camera Calibration and 3D Reconstruction on official resource.
interaxial distance between left and right camera lenses must be
not greater than 200 mm. When the
interaxial distance is larger than the
interocular distance, the effect is called
hyperdivergence and results not only in depth exaggeration in the scene but also in viewer’s physical inconvenience. Read Autodesk’s Stereoscopic Filmmaking Whitepaper to find out more on this topic.
Visual inaccuracies in resulted
Disparity Map may occur due to incorrect Camera Mode calculation. Many stereographers prefer
Toe-In camera mode but Pixar, for example, prefers
Parallel camera mode.
In stereoscopy, if a vertical shift occurs (even if one of the views is shifted up by 1 mm) it ruins a robust stereo experience. So, before generating
Disparity Map you must be sure that left and right views of your stereo pair are accordingly aligned. Look at Technicolor Sterreoscopic Whitepaper about 15 common problems in stereo.
Stereo Rectification Matrix:
┌ ┐ | f 0 cx tx | | 0 f cy ty | # use "ty" value to fix vertical shift in one image | 0 0 1 0 | └ ┘
cv.StereoRectify(cameraMatrix1, cameraMatrix2, distCoeffs1, distCoeffs2, imageSize, R, T, R1, R2, P1, P2, Q=None, flags=CV_CALIB_ZERO_DISPARITY, alpha=-1, newImageSize=(0, 0)) -> (roi1, roi2)
Lens Distortion is very important topic in stereo composition. Before generating a
Disparity Map you need to undistort left and right views, after this generate a disparity channel, and then redistort both views again.
For creating a high-quality
Disparity Map you need left and right
Depth Channels that must be pre-generated. When you work in 3D package you can render a high-quality Depth Channel (with crisp edges) with just one click. But generating a high-quality depth channel from video sequence is not easy because stereo pair has to move in your environment for producing an initial data for future depth-from-motion algorithm. If there’s no motion in a frame a depth channel will be extremely poor.
Depthchannel itself has one more drawback – its edges do not match the edges of the RGB because it has no anti-aliasing.
Here I’d like to represent a quick approach to generate a
import numpy as np import cv2 as cv from matplotlib import pyplot as plt imageLeft = cv.imread('paris_left.png', 0) imageRight = cv.imread('paris_right.png', 0) stereo = cv.StereoBM_create(numDisparities=16, blockSize=15) disparity = stereo.compute(imageLeft, imageRight) plt.imshow(disparity, 'gray') plt.show()
OP didn’t provide original images, so I’m using
Tsukuba from the Middlebury data set.
See the publication here for details.
The large black areas of your calibrated rectified images would lead me to believe that for those, calibration was not done very well. There’s a variety of reasons that could be at play, maybe the physical setup, maybe lighting when you did calibration, etc., but there are plenty of camera calibration tutorials out there for that and my understanding is that you are asking for a way to get a better depth map from an uncalibrated setup (this isn’t 100% clear, but the title seems to support this and I think that’s what people will come here to try to find).
Your basic approach is correct, but the results can definitely be improved. This form of depth mapping is not among those that produce the highest quality maps (especially being uncalibrated). The biggest improvement will likely come from using a different stereo matching algorithm. The lighting may also be having a significant effect. The right image (at least to my naked eye) appears to be less well lit which could interfere with the reconstruction. You could first try brightening it to the same level as the other, or gather new images if that is possible. From here out, I’ll assume you have no access to the original cameras, so I’ll consider gathering new images, altering the setup, or performing calibration to be out of scope. (If you do have access to the setup and cameras, then I would suggest checking calibration and using a calibrated method as this will work better).
StereoBM for calculating your disparity (depth map) which does work, but
StereoSGBM is much better suited for this application (it handles smoother edges better). You can see the difference below.
This article explains the differences in more depth:
Block matching focuses on high texture images (think a picture of a tree) and semi-global block matching will focus on sub pixel level matching and pictures with more smooth textures (think a picture of a hallway).
Without any explicit intrinsic camera parameters, specifics about the camera setup (like focal distance, distance between the cameras, distance to the subject, etc.), a known dimension in the image, or motion (to use structure from motion), you can only obtain 3D reconstruction up to a projective transform; you won’t have a sense of scale or necessarily rotation either, but you can still generate a relative depth map. You will likely suffer from some barrel and other distortions which could be removed with proper camera calibration, but you can get reasonable results without it as long as the cameras aren’t terrible (lens system isn’t too distorted) and are set up pretty close to canonical configuration (which basically means they are oriented such that their optical axes are as close to parallel as possible, and their fields of view overlap sufficiently). This doesn’t however appear to be the OPs issue as he did manage to get alright rectified images with the uncalibrated method.
The results are much better:
This result looks similar to the OPs problems (speckling, gaps, wrong depths in some areas).
This result looks much better and uses roughly the same method as the OP, minus the final disparity calculation, making me think the OP would see similar improvements on his images, had they been provided.
There’s a good article about this in the OpenCV docs. I’d recommend looking at it if you need really smooth maps.
The example photos above are frame 1 from the scene
ambush_2 in the MPI Sintel Dataset.
import cv2 import numpy as np import matplotlib.pyplot as plt imgL = cv2.imread("tsukuba_l.png", cv2.IMREAD_GRAYSCALE) # left image imgR = cv2.imread("tsukuba_r.png", cv2.IMREAD_GRAYSCALE) # right image def get_keypoints_and_descriptors(imgL, imgR): """Use ORB detector and FLANN matcher to get keypoints, descritpors, and corresponding matches that will be good for computing homography. """ orb = cv2.ORB_create() kp1, des1 = orb.detectAndCompute(imgL, None) kp2, des2 = orb.detectAndCompute(imgR, None) ############## Using FLANN matcher ############## # Each keypoint of the first image is matched with a number of # keypoints from the second image. k=2 means keep the 2 best matches # for each keypoint (best matches = the ones with the smallest # distance measurement). FLANN_INDEX_LSH = 6 index_params = dict( algorithm=FLANN_INDEX_LSH, table_number=6, # 12 key_size=12, # 20 multi_probe_level=1, ) # 2 search_params = dict(checks=50) # or pass empty dictionary flann = cv2.FlannBasedMatcher(index_params, search_params) flann_match_pairs = flann.knnMatch(des1, des2, k=2) return kp1, des1, kp2, des2, flann_match_pairs def lowes_ratio_test(matches, ratio_threshold=0.6): """Filter matches using the Lowe's ratio test. The ratio test checks if matches are ambiguous and should be removed by checking that the two distances are sufficiently different. If they are not, then the match at that keypoint is ignored. https://stackoverflow.com/questions/51197091/how-does-the-lowes-ratio-test-work """ filtered_matches =  for m, n in matches: if m.distance < ratio_threshold * n.distance: filtered_matches.append(m) return filtered_matches def draw_matches(imgL, imgR, kp1, des1, kp2, des2, flann_match_pairs): """Draw the first 8 mathces between the left and right images.""" # https://docs.opencv.org/4.2.0/d4/d5d/group__features2d__draw.html # https://docs.opencv.org/2.4/modules/features2d/doc/common_interfaces_of_descriptor_matchers.html img = cv2.drawMatches( imgL, kp1, imgR, kp2, flann_match_pairs[:8], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS, ) cv2.imshow("Matches", img) cv2.imwrite("ORB_FLANN_Matches.png", img) cv2.waitKey(0) def compute_fundamental_matrix(matches, kp1, kp2, method=cv2.FM_RANSAC): """Use the set of good mathces to estimate the Fundamental Matrix. See https://en.wikipedia.org/wiki/Eight-point_algorithm#The_normalized_eight-point_algorithm for more info. """ pts1, pts2 = ,  fundamental_matrix, inliers = None, None for m in matches[:8]: pts1.append(kp1[m.queryIdx].pt) pts2.append(kp2[m.trainIdx].pt) if pts1 and pts2: # You can play with the Threshold and confidence values here # until you get something that gives you reasonable results. I # used the defaults fundamental_matrix, inliers = cv2.findFundamentalMat( np.float32(pts1), np.float32(pts2), method=method, # ransacReprojThreshold=3, # confidence=0.99, ) return fundamental_matrix, inliers, pts1, pts2 ############## Find good keypoints to use ############## kp1, des1, kp2, des2, flann_match_pairs = get_keypoints_and_descriptors(imgL, imgR) good_matches = lowes_ratio_test(flann_match_pairs, 0.2) draw_matches(imgL, imgR, kp1, des1, kp2, des2, good_matches) ############## Compute Fundamental Matrix ############## F, I, points1, points2 = compute_fundamental_matrix(good_matches, kp1, kp2) ############## Stereo rectify uncalibrated ############## h1, w1 = imgL.shape h2, w2 = imgR.shape thresh = 0 _, H1, H2 = cv2.stereoRectifyUncalibrated( np.float32(points1), np.float32(points2), F, imgSize=(w1, h1), threshold=thresh, ) ############## Undistort (Rectify) ############## imgL_undistorted = cv2.warpPerspective(imgL, H1, (w1, h1)) imgR_undistorted = cv2.warpPerspective(imgR, H2, (w2, h2)) cv2.imwrite("undistorted_L.png", imgL_undistorted) cv2.imwrite("undistorted_R.png", imgR_undistorted) ############## Calculate Disparity (Depth Map) ############## # Using StereoBM stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15) disparity_BM = stereo.compute(imgL_undistorted, imgR_undistorted) plt.imshow(disparity_BM, "gray") plt.colorbar() plt.show() # Using StereoSGBM # Set disparity parameters. Note: disparity range is tuned according to # specific parameters obtained through trial and error. win_size = 2 min_disp = -4 max_disp = 9 num_disp = max_disp - min_disp # Needs to be divisible by 16 stereo = cv2.StereoSGBM_create( minDisparity=min_disp, numDisparities=num_disp, blockSize=5, uniquenessRatio=5, speckleWindowSize=5, speckleRange=5, disp12MaxDiff=2, P1=8 * 3 * win_size ** 2, P2=32 * 3 * win_size ** 2, ) disparity_SGBM = stereo.compute(imgL_undistorted, imgR_undistorted) plt.imshow(disparity_SGBM, "gray") plt.colorbar() plt.show()