Find and Replace Values in XML using Python

Question:

I am looking to edit XML files using python. I want to find and replace keywords in the tags. In the past, a co-worker had set up template XML files and used a “find and replace” program to replace these key words. I want to use python to find and replace these key words with values. I have been teaching myself the Elementtree module, but I am having trouble trying to do a find and replace. I have attached a snid-bit of my XML file. You will seen some variables surrounded by % (ie %SITEDESCR%) These are the words I want to replace and then save the XML to a new file. Any help or suggestions would be great.

Thanks,
Mike

<metadata>
<idinfo>
<citation>
<citeinfo>
 <origin>My Company</origin>
 <pubdate>05/04/2009</pubdate>
 <title>POLYGONS</title>
 <geoform>vector digital data</geoform>
 <onlink>\C$ArcGISDevelopmentGeodatabasePDA_STD_05_25_2009.gdb</onlink>
</citeinfo>
</citation>
 <descript>
 <abstract>This dataset represents the mapped polygons developed from the field data for the %SITEDESCR%.</abstract>
 <purpose>This dataset was created to accompany some stuff.</purpose>
 </descript>
<timeperd>
<timeinfo>
<rngdates>
 <begdate>%begdate%</begdate>
 <begtime>unknown</begtime>
 <enddate>%enddate%</enddate>
 <endtime>unknown</endtime>
 </rngdates>
 </timeinfo>
 <current>ground condition</current>
 </timeperd>
Asked By: Mike

||

Answers:

If you just want to replace the bits enclosed with %, then this isn’t really an XML problem. You can easily do it with regex:

import re
xmlstring = open('myxmldocument.xml', 'r').read()
substitutions = {'SITEDESCR': 'myvalue', ...}
pattern = re.compile(r'%([^%]+)%')
xmlstring = re.sub(pattern, lambda m: substitutions[m.group(1)], xmlstring)
Answered By: Ismail Badawi

To replace the placeholders all you need is to read the file line by line and replace:

for line in open(template_file_name,'r'):
  output_line = line
  output_line = string.replace(output_line, placeholder, value)
  print output_line 
Answered By: Rostislav Matl

The basics:

from xml.etree import ElementTree as et
tree = et.parse(datafile)
tree.find('idinfo/timeperd/timeinfo/rngdates/begdate').text = '1/1/2011'
tree.find('idinfo/timeperd/timeinfo/rngdates/enddate').text = '1/1/2011'
tree.write(datafile)

You can shorten the path if the tag name is unique. This syntax finds the first node at any depth level in the tree.

tree.find('.//begdate').text = '1/1/2011'
tree.find('.//enddate').text = '1/1/2011'

Also, read the documentation, esp. the XPath support for locating nodes.

Answered By: Mark Tolonen

You can modify in place and safely do so with xpath rather than full paths or worse, regex. See below and check out the docs on etree

from lxml import etree
raw = """
<node>
<begdate>%begdate%</begdate>
<begtime>unknown</begtime>
<enddate>%enddate%</enddate>
<endtime>unknown</endtime>
</node>"""
nodes = etree.fromstring(raw.strip())
shh = [setattr(x, "text", "DATE: 2021-01-01") for x in nodes.xpath(".//*[.='%begdate%']")]
nodes.xpath(".//begdate//text()")
['DATE: 2021-01-01']
Answered By: Carl Boneri
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.