Reading PASCAL VOC annotations in python


I have annotations in xml files such as this one, which follows the PASCAL VOC convention:

<database>synthetic initialization</database>
<annotation>PASCAL VOC2007</annotation>

What is the cleanest way of retrieving for example the fields filename and bndbox in Python?

I was trying to ElementTree, which seems to be the official Python solution, but I can’t make it work.

My code so far:

from xml.etree import ElementTree as ET
tree = ET.parse("data/all/annotations/" + file)
fn = tree.find('filename').text
boxes = tree.findall('bndbox')

this produces

fn == 'chanel1.jpg'
boxes == []

So it succesfully extracts the filename field, but not the bndbox‘es.

Asked By: Jsevillamol



That’s a quite easy solution for your problem:

This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename
Once I struggled with bndbox tags which where mixed up (ymin, xmin,…) or any other strange combinations, so this code read the tags not only the position.

Finally I updated the code. Thanks to craq and Pritesh Gohil, you were absolutely right.

Hope it helps…

import xml.etree.ElementTree as ET

def read_content(xml_file: str):

    tree = ET.parse(xml_file)
    root = tree.getroot()

    list_with_all_boxes = []

    for boxes in root.iter('object'):

        filename = root.find('filename').text

        ymin, xmin, ymax, xmax = None, None, None, None

        ymin = int(boxes.find("bndbox/ymin").text)
        xmin = int(boxes.find("bndbox/xmin").text)
        ymax = int(boxes.find("bndbox/ymax").text)
        xmax = int(boxes.find("bndbox/xmax").text)

        list_with_single_boxes = [xmin, ymin, xmax, ymax]

    return filename, list_with_all_boxes

name, boxes = read_content("file.xml")
Answered By: pix_1

Another option is to use the standard xmldict library to load the VOC XML in a python dict.

import xmltodict

with open('/path/to/voc.xml') as file:
        file_data =
        dict_data = xmltodict.parse(file_data)
Answered By: Octave

My attempt at it, slightly more readable than the accepted answer, offering the option to convert to 0-based pixel coordinates, and pairing the name of the object rather than the name of the file with each box’s coordinates.

Output example:

{'excavator': {'xmin': 0, 'ymin': 0, 'xmax': 1265, 'ymax': 587},
 'dump_truck': {'xmin': 259, 'ymin': 159, 'xmax': 713, 'ymax': 405}}
import xml.etree.ElementTree as ET

def read_Pascal_VOC(xml_file,do_0_based):
    # Pascal VOC is 1-based, but more recent formats like MS COCO are 0-based
    # see, e.g.,
    if do_0_based:
        to_subtract = 1
        to_subtract = 0

    tree = ET.parse(xml_file)
    root = tree.getroot()

    boxes = dict()

    for box in root.iter('object'):

        name = box.find('name').text
        bb = box.find('bndbox')
        # dict to remove any ambiguity ordering-wise
        coords = dict(xmin = bb.find('xmin').text, 
                      ymin = bb.find('ymin').text, 
                      xmax = bb.find('xmax').text, 
                      ymax = bb.find('ymax').text)
        coords = {k:int(v)-to_subtract for k,v in coords.items()}
        if name in boxes:
            boxes[name] = boxes[name] + [coords]
            boxes[name] = [coords]

    return boxes
Answered By: Antoine
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.