Object Pose solvePnP

Question:

Unable to get object Pose and draw axis with 4 markers

I am trying to get the object pose by following This tutorial for Pose Estimation. In the video the author uses chessboard pattern(24,17) and mentions in the comment that any object with markers(detectable) can be used to estimate the pose.
I am using this Object with only 4 circular markers I am able to detect the markers and get the (x,y) points(ImagePoints) and ObjectPoint with an arbitrary ref. I have my camera calibrated(CameraMatrix and Distortion Coefficients). Following the tutorial i am unable to draw Object Frame.
This is what i was able to do so far.

#(x,y) points of detected markers, another function processes the image and returns the 4 points
Points = [(x1,y1),(x2,y2),(x3,y3),(x4,y5)] 

image_points  = np.array([
        Points[0],
        Points[1],
        Points[2],
        Points[3]
    ],dtype=np.float32)

image_points = np.ascontiguousarray(image_points).reshape((4,1,2))
    
object_points  = np.array([
        (80,56,0),
        (80,72,0),
        (57,72,0),
        (57,88,0)],
        dtype=np.float32).reshape((4,1,3)) #Set Z as 0

axis = np.float32([[5,0,0], [0,5,0], [0,0,-5]]).reshape(-1,3)

imgpts, jac = cv2.projectPoints(axis, rotation_vector, translation_vector, mtx, dist)

What am i missing?

This is what i am trying to acheive.
Goal

This is the current result
Current

Camera Distance from the object is fixed. I need to track Translation and Rotation in x and y.

EDIT:

Sample Image markings

Updated Object

Updated Result

Pixel Values of markers from Top-Left to bottom-right

Point_A = (1081, 544) 
Point_B = (1090, 782) 
Point_C = (824, 785)  #Preferred Origin Point
Point_D = (826, 1050) 

Camera Parameters

mtx: [[2.34613584e+03 0.00000000e+00 1.24680613e+03]
      [0.00000000e+00 2.34637787e+03 1.11379469e+03]
      [0.00000000e+00 0.00000000e+00 1.00000000e+00]]
dist: 
     [[-0.05595266  0.07570472  0.00200983  0.00073797 -0.30768105]]

Python Code

Asked By: AqibF

||

Answers:

Here’s an example in C++ with your image and object, but I extracted the image points again (because they didnt fit to the provided values) and I used a pinhole camera (no distortion). Results should be similar/better if you use the actual camera parameters.

int main()
{
    cv::Mat img = cv::imread("C:/data/StackOverflow/solvePnp.png");

    // assuming a pinhole camera, because the actual intrinsics/distortion is unknown.
    cv::Mat intrinsics = cv::Mat::eye(3,3,CV_64FC1);
    intrinsics.at<double>(0, 2) = img.cols / 2.0;
    intrinsics.at<double>(1, 2) = img.rows / 2.0;
    intrinsics.at<double>(0, 0) = 1000;
    intrinsics.at<double>(1, 1) = 1000;
    
    // assumed: no distortion.
    std::vector<double> distCoeffs;
    distCoeffs.resize(4, 0);

    std::vector<cv::Point3f> objPoints;
    // provided object points from
    objPoints.push_back({80.0f,56.0f,0.0f});
    objPoints.push_back({ 80.0f,72.0f, 0.0f });
    objPoints.push_back({ 57.0f,72.0f, 0.0f });
    objPoints.push_back({ 57.0f,88.0f, 0.0f });
    
    // we want the third point to be the origin of the object, so we have to shift the coordinate system:
    cv::Point3f origin = objPoints[2];

    for (int i = 0; i < objPoints.size(); ++i)
    {
        objPoints[i] = objPoints[i] - origin;
    }

    std::vector<cv::Point2f> imgPoints;

    /*
    // WRONT PROVIDED VALUES!
    imgPoints.push_back({ 1081, 544 });
    imgPoints.push_back({ 1090, 782 });
    imgPoints.push_back({ 824, 785 });
    imgPoints.push_back({ 826, 1050 });
    */

    // image points read from the image
    imgPoints.push_back({ 1123, 558 });
    imgPoints.push_back({ 1132, 814 });
    imgPoints.push_back({ 851, 818 });
    imgPoints.push_back({ 852, 1097 });


    cv::Vec3d rot;
    cv::Vec3d trans;
    cv::solvePnP(objPoints, imgPoints, intrinsics, distCoeffs, rot, trans);

    std::vector<cv::Point3f> axis;
    axis.push_back({ 0,0,0 });
    axis.push_back({ 10,0,0 });
    axis.push_back({ 0,10,0 });
    axis.push_back({ 0,0,10 });

    std::vector<cv::Point2f> axisImg;

    cv::projectPoints(axis, rot, trans, intrinsics, distCoeffs, axisImg);

    cv::line(img, axisImg[0], axisImg[1], cv::Scalar(0, 0, 255),5);
    cv::line(img, axisImg[0], axisImg[2], cv::Scalar(0, 255, 0),5);
    cv::line(img, axisImg[0], axisImg[3], cv::Scalar(255, 0, 0),5);

    std::cout << axisImg[0] << std::endl;
    std::cout << axisImg[1] << std::endl;
    std::cout << axisImg[2] << std::endl;
    std::cout << axisImg[3] << std::endl;

    for (int i = 0; i < imgPoints.size(); ++i)
    {
        cv::circle(img, imgPoints[i], 5, cv::Scalar(i * 255, (i == 0) ? 255 : 0, i * 50));
    }

    cv::imwrite("C:/data/StackOverflow/solvePnp_RESULT.png", img);
    cv::resize(img, img, cv::Size(), 0.25, 0.25);
    cv::imshow("img", img);

    cv::waitKey(0);
}

red = X
green = Y
blue = Z

enter image description here

Answered By: Micka