Is there a full list of potential labels that Google's Vision API will return?

Question:

I’ve been testing out Google’s Vision API to attach labels to different images.

For a given image, I’ll get back something like this:

"google_labels": {
            "responses": [{
                "labelAnnotations": [{
                    "score": 0.8966763,
                    "description": "food",
                    "mid": "/m/02wbm"
                }, {
                    "score": 0.80512983,
                    "description": "produce",
                    "mid": "/m/036qh8"
                }, {
                    "score": 0.73635191,
                    "description": "juice",
                    "mid": "/m/01z1kdw"
                }, {
                    "score": 0.69849229,
                    "description": "meal",
                    "mid": "/m/0krfg"
                }, {
                    "score": 0.53875387,
                    "description": "fruit",
                    "mid": "/m/02xwb"
                }]
            }]
        }

–> My questions are:

  1. Does anybody know if Google has published their full list of labels (['produce', 'meal', ...]) and where I could find that?
  2. Are those labels structured in any way? – e.g. is it known that ‘food’ is a superset of ‘produce’, for example.

I’m guessing ‘No’ and ‘No’ as I haven’t been able to find anything, but, maybe not. Thanks!

Asked By: Hillary Sanders

||

Answers:

There is an API to search them called Google Knowledge Graph API:

https://developers.google.com/knowledge-graph/reference/rest/v1/

They link it at the bottom of Google Vision API Documentation:

https://cloud.google.com/vision/docs/labels


Edit: more info

Ok, mids starting with /g/ are google entities, mids starting with /m/ are Freebase identifiers, but google kgraph API doesn’t returns them always.

This data is public and can be downloaded, but there are too many records in the database and Google haven’t published which ones of them they use.

Example of MID returned in vision api and the record in Wikidata:

{
    desc: "institution",
    mid: "/m/01r28c",
    score: 72.29216694831848,
    confidence: 0,
    locations: [ ],
    properties: [ ]
},

https://www.wikidata.org/wiki/Q178706


The last freebase dump can be downloaded here:

https://developers.google.com/freebase/

Answered By: Wiliam

While I can’t verify the completeness of the database, the Google Open Images project has a list of around 20k classifications.

If you browse to the download page you are able to download the list with those descriptions as CSV.

I checked a few reference images within CloudVision and had the following results:

ID / CloudVision Classification / OpenImages Classification
1. 01ssh5 / Shoulder / Shoulder (Body Part)
2. 09cx8 / Finger / Finger
3. 068jd / Photograph / Photograph
4. 01k74n / Facial expression / Facial expression
5. 04hgtk / Head / Human Head

I was able to find all IDs with the same meaning in the CSV – so as a base list this should be sufficient. Be aware that you should always match by ID, not by classification, as there are a few slight changes.

If you find any IDs in CloudVision but not in the list, I’d be interested to know in the comments!

Answered By: James Cameron