Extract Coordinates from KML BatchGeo File with Python
Question:
I’ve uploaded some addresses to BatchGeo and downloaded the resulting KML file from which I want to extract the coordinates. I managed to prettify the jumbled text file online here, but I don’t know how to parse it to extract the co-ordinates.
<?xml version="1.0" ?>
<kml >
<Document>
<Placemark>
<name>...</name>
<description>....</description>
<Point>
<coordinates>-3.1034345755337,57.144817425039,0</coordinates>
</Point><address>...</address>
<styleUrl>#0</styleUrl>
</Placemark>
</Document>
</kml>
There seem to be several kml libraries for python but not much in the way of documentation (e.g. pyKML). Using the tutorial, I have got this far and created an ‘lxml.etree._ElementTree’ object but I’m not sure of its attributes:
from pykml import parser
kml_file = "BatchGeo.kml"
with open(kml_file) as f:
doc = parser.parse(f)
coordinate = doc.Element("coordinates")
print coordinate
This gives the error:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'Element'
So how do I get a list of co-ordinates? Thanks.
Answers:
from pykml import parser
root = parser.fromstring(open('BatchGeo.kml', 'r').read())
print root.Document.Placemark.Point.coordinates
see the pykml docs
hope that helps!
For some reason, I didn’t have a Point element in the KML, it was a LineString element instead. Furthermore, the text string value of the LineString element had some extra characters in it, so I couldn’t just split on comma delimiters, I had to use re
instead. For eastern north America longs are two-digit negative and lats are two-digit positive, so there was a simple way to separate them according to the negative sign. So to get the list of coordinates:
from pykml import parser
import re
kml_file = '/path/to/file.kml'
with open(kml_file) as f:
doc = parser.parse(f)
root = doc.getroot()
coords = root.Document.Placemark.LineString.coordinates.text
long = re.findall(r'(-[0-9]{2}.[0-9]*)',coords)
lat = re.findall(r'[^-]([0-9]{2}.[0-9]*)',coords)
print(long,lat)
I’ve uploaded some addresses to BatchGeo and downloaded the resulting KML file from which I want to extract the coordinates. I managed to prettify the jumbled text file online here, but I don’t know how to parse it to extract the co-ordinates.
<?xml version="1.0" ?>
<kml >
<Document>
<Placemark>
<name>...</name>
<description>....</description>
<Point>
<coordinates>-3.1034345755337,57.144817425039,0</coordinates>
</Point><address>...</address>
<styleUrl>#0</styleUrl>
</Placemark>
</Document>
</kml>
There seem to be several kml libraries for python but not much in the way of documentation (e.g. pyKML). Using the tutorial, I have got this far and created an ‘lxml.etree._ElementTree’ object but I’m not sure of its attributes:
from pykml import parser
kml_file = "BatchGeo.kml"
with open(kml_file) as f:
doc = parser.parse(f)
coordinate = doc.Element("coordinates")
print coordinate
This gives the error:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'Element'
So how do I get a list of co-ordinates? Thanks.
from pykml import parser
root = parser.fromstring(open('BatchGeo.kml', 'r').read())
print root.Document.Placemark.Point.coordinates
see the pykml docs
hope that helps!
For some reason, I didn’t have a Point element in the KML, it was a LineString element instead. Furthermore, the text string value of the LineString element had some extra characters in it, so I couldn’t just split on comma delimiters, I had to use re
instead. For eastern north America longs are two-digit negative and lats are two-digit positive, so there was a simple way to separate them according to the negative sign. So to get the list of coordinates:
from pykml import parser
import re
kml_file = '/path/to/file.kml'
with open(kml_file) as f:
doc = parser.parse(f)
root = doc.getroot()
coords = root.Document.Placemark.LineString.coordinates.text
long = re.findall(r'(-[0-9]{2}.[0-9]*)',coords)
lat = re.findall(r'[^-]([0-9]{2}.[0-9]*)',coords)
print(long,lat)