Extract Coordinates from KML BatchGeo File with Python

Question:

I’ve uploaded some addresses to BatchGeo and downloaded the resulting KML file from which I want to extract the coordinates. I managed to prettify the jumbled text file online here, but I don’t know how to parse it to extract the co-ordinates.

<?xml version="1.0" ?>
<kml >
    <Document>
        <Placemark>
            <name>...</name>
            <description>....</description>
            <Point>
                <coordinates>-3.1034345755337,57.144817425039,0</coordinates>
            </Point><address>...</address>
            <styleUrl>#0</styleUrl>
        </Placemark>
    </Document>
</kml>

There seem to be several kml libraries for python but not much in the way of documentation (e.g. pyKML). Using the tutorial, I have got this far and created an ‘lxml.etree._ElementTree’ object but I’m not sure of its attributes:

from pykml import parser

kml_file = "BatchGeo.kml"

with open(kml_file) as f:

    doc = parser.parse(f)

coordinate = doc.Element("coordinates")
print coordinate

This gives the error:

AttributeError: 'lxml.etree._ElementTree' object has no attribute 'Element'

So how do I get a list of co-ordinates? Thanks.

Asked By: eamon1234

||

Answers:

from pykml import parser

root = parser.fromstring(open('BatchGeo.kml', 'r').read())
print root.Document.Placemark.Point.coordinates

see the pykml docs

hope that helps!

Answered By: eqzx

For some reason, I didn’t have a Point element in the KML, it was a LineString element instead. Furthermore, the text string value of the LineString element had some extra characters in it, so I couldn’t just split on comma delimiters, I had to use re instead. For eastern north America longs are two-digit negative and lats are two-digit positive, so there was a simple way to separate them according to the negative sign. So to get the list of coordinates:

from pykml import parser
import re

kml_file = '/path/to/file.kml'
with open(kml_file) as f:
    doc = parser.parse(f)
root = doc.getroot()
coords = root.Document.Placemark.LineString.coordinates.text
long = re.findall(r'(-[0-9]{2}.[0-9]*)',coords)
lat = re.findall(r'[^-]([0-9]{2}.[0-9]*)',coords)
print(long,lat)
Answered By: J B
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.