ORB data for computer vision: What are the items in a row?

Question:

I’m looking at the MediaEval 2018 memorability challenge (here).

One of the features they describe is ORB features. I got the data from the challenge, and I’m trying to understand how the ORB data works.

If I run this code:

with open('/Memorability_data/ORB/video10-0.p', 'rb') as f: ##change the file name; just an example
  data = pickle.load(f)
  print(len(data))
  print(data[0])
  print(len(data[0]))
  print('*')
  for i in data[0]:
    print(i)

The output is:

500
((1143.0, 372.0), 31.0, 169.99649047851562, 0.001837713411077857, 0, -1, array([228, 118, 156,  58, 232, 237,  21, 206, 219, 127,  33,  56, 134,
       216,  79,  27, 129,  17, 234,  19,  39, 103, 202, 112,  20,  18,
        85, 127, 216,  89, 203,   7], dtype=uint8))
7
*
(1143.0, 372.0)
31.0
169.99649047851562
0.001837713411077857
0
-1
[228 118 156  58 232 237  21 206 219 127  33  56 134 216  79  27 129  17
 234  19  39 103 202 112  20  18  85 127 216  89 203   7]

So I understand each video has a file, each file is 500 rows long, and each row looks similar to above. I’m trying to understand what these rows mean.

I found this, and they describe:

static Ptr<ORB> cv::ORB::create (   int     nfeatures = 500,
float   scaleFactor = 1.2f,
int     nlevels = 8,
int     edgeThreshold = 31,
int     firstLevel = 0,
int     WTA_K = 2,
int     scoreType = ORB::HARRIS_SCORE,
int     patchSize = 31,
int     fastThreshold = 20 
)       

I’m not understanding what the data in my file is. It clearly doesn’t match the example I’ve found (because e.g. above says the last thing in the row should be an int (fastThreshold=20), whereas the last item in my row is a list).

Can someone either explain what the items in my list are, or provide a reference that has it? (or has the data I was sent been pre-processed in some way, can someone tell)? My ultimate aim is to convert this data to a CSV file, but I don’t know what the headings should be?

I found similar SO questions (e.g.here and here), and I looked at the paper in one of the answers, and I’m still not clear.

Asked By: Slowat_Kela

||

Answers:

I cannot determine the meaning of every field, but I think I can guess it for some:

  • First tuple is x, y position of the feature (most likely pixel coordinates)
  • 31 is the size of the feature, given by the patch size
  • 169 should be the orientation of the feature in degrees
  • The list at the end gives the description of the feature. This is generated by the BRIEF descriptor. It’s a list of 32 8 bit values. If you generate the bit pattern for each of these numbers you end up with 256 1’s or 0’s. This is the binary feature description that is used for matching.
Answered By: Mannorok

The 7 features of your data[0] variable are listed in order as follows:

  1. _pt: x & y coordinates of the keypoint
  2. _size: keypoint diameter
  3. _angle: keypoint orientation
  4. _response: keypoint detector response on the keypoint (that is, strength of the keypoint)
  5. _octave: pyramid octave in which the keypoint has been detected
  6. _class_id: object id
  7. descriptors: keypoint descriptors feature

Information was taken from here.

Answered By: BroML
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.