How to search and replace text in an XML file using Python?

Question:

How do I search an entire xml file for a specific text pattern and then replace each occurrence of that text with new text pattern in Python 3.5?

Everything else (format, attributes, comments, etc.) needs to remain as it is in the original xml file.

I am running Python 3.5.1 on Windows (win32).

Specifically, I would like to replace each occurrence of “FEATURE NAME” with “THIS WORKED” and replace each occurrence of “FEATURE NUMBER” with “12345”.

I have been trying to learn Python and xml.etree.ElementTree but cannot figure this out. I already looked at “Search and replace a line in a .xml file in Python”, “Search and replace a line in a file in Python”, and “How to search and replace text in a file using Python?” and other existing Q/A’s on this site but cannot figure this out – I’m not an experienced programmer, so please let me know if more input is needed . Your help is greatly appreciated!!!

Here is a copy of what the xml code looks like when I open it in Notepad (except I added spaces to indent each line and hit return for some lines when I pasted it into this question):

<description-topic>
    <access-info>
        <index-term-set>
            <index-term>
                <primary>FID FEATURE NUMBER</primary>
            </index-term>
            <index-term>
                <primary>FEATURE NAME</primary>
            </index-term>
            <index-term>
                <primary>Common features</primary>
                <secondary>FID FEATURE NUMBER</secondary>
            </index-term>
        </index-term-set>
    </access-info>
    <title>FEATURE NUMBER - FEATURE NAME</title>
    <block>
        <label>Platform</label>
        <comment>REVIEWERS: I guessed at the FEATURE NAME</comment>
        <para>
            This feature applies to the following platforms: FEATURE NAME<!--Check the values--></para>
    </block>
    <block branch="no">
        <label>Feature Benefits</label>
        <para>
            <comment>REVIEWERS: What do we put here? See template (link given in review email) for more information.</comment>
        </para>
    </block>
    <block branch="no">
        <label>Dependencies</label>
        <para/>
        <subblock>
            <label>Features</label>
            <comment>What FEATURE NAME do we put here?</comment>
        </subblock>
        <subblock>
            <label>Hardware</label>
            <comment>What FEATURE NAME do we put here?</comment>
            <para>This feature applies to the following: FEATURE NUMBER and text.</para><?Pub Caret -1?>
        </subblock>
        <subblock>
            <label>Dependencies outside the eNodeB</label>
            <comment>What FEATURE NAME do we put here?</comment>
        </subblock>
    </block>
    <block branch="no">
        <label>Impacts</label>
        <comment>REVIEWERS: What FEATURE NUMBER do we put here?</comment>
        <para>
            <comment/>
        </para>
    </block>
</description-topic>

Here is the latest code I am trying to get to work:

from xml.etree import ElementTree as et
tree = et.parse('Atemplate2.xml')
tree.find('description-topic/access-info/index-term-set/index-term/primary/').text = '12345'
tree.write('Atemplate2.xml')

I get the following error:
Traceback (most recent call last):
File “ajktest18.py”, line 15, in
tree.find(‘description-topic/access-info/index-term-set/index-term/primary/’).text = ‘12345’

AttributeError: ‘NoneType’ object has no attribute ‘text’

I would prefer to be able to search and modify any occurrences in the entire file, but I can’t figure out how to get to even one specific occurrence of the text I am searching for.

Here is the code I tried to use to find the path:

import xml.etree.ElementTree as ET
tree = ET.parse('Atemplate.xml')
root = tree.getroot()

print(root.tag, root.attrib, root.text)

for child in root:
    print(child.tag, child.attrib, child.text)
for label in root.iter('label'):
    print(label.tag, label.attrib, label.text)
for title in root.iter('title'):
    print(title.attrib)

I also tried the following code:

with open('Atemplate2.xml') as f:
    tree = ET.parse(f)
    root = tree.getroot()

for elem in root.getiterator():
    try:
        elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
        elem.text = elem.text.replace('FEATURE NUMBER', '12345')
    except AttributeError:
        pass

tree.write('output.xml')

but that gives the following error:

File "<pyshell#40>", line 2, in <module>
    tree = ET.parse(f)
File "C:MyPathPython35-32libxmletreeElementTree.py", line 1182, in parse
    tree.parse(source, parser)
File "C: MyPath Python35-32libxmletreeElementTree.py", line 594, in parse
    self._root = parser._parse_whole(source)
File "C: MyPath Python35-32libencodingscp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x9d in position 1119: character maps to

#

#

FINAL UPDATE – Here is the code that worked for me in the end (thank u, Jarad!):

import lxml.etree as ET
#using lxml instead of xml preserved the comments

#adding the encoding when the file is opened and written is needed to avoid a charmap error
with open('filename.xml', encoding="utf8") as f:
  tree = ET.parse(f)
  root = tree.getroot()


  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
      elem.text = elem.text.replace('FEATURE NUMBER', '123456')
    except AttributeError:
      pass

#tree.write('output.xml', encoding="utf8")
# Adding the xml_declaration and method helped keep the header info at the top of the file.
tree.write('output.xml', xml_declaration=True, method='xml', encoding="utf8")
Asked By: AJK

||

Answers:

Caveats:

  • I have never worked with the xml.etree.ElementTree library
  • I have never worked with it because I never find myself manipulating XML
  • I don’t know if this is the “best” way compared to someone that knows the library in and out
  • Commentors seem set on judging you instead of helping you out

This is a modification from this excellent answer. The thing is, you need to read the XML file in and parse it.

import xml.etree.ElementTree as ET

with open('xmlfile.xml', encoding='latin-1') as f:
  tree = ET.parse(f)
  root = tree.getroot()

  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
      elem.text = elem.text.replace('FEATURE NUMBER', '123456')
    except AttributeError:
      pass

tree.write('output.xml', encoding='latin-1')

Note that you can change the encoding parameter to something else such as: utf-8, cp1252, ISO-8859-1, etc. Really depends on your system and file.

Answered By: Jarad

Only this solution worked for me.

import lxml.etree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
tree = root.getroottree()
for elem in root.getiterator():
    try:
        elem.text = elem.text.replace('abc', 'xyz')
    except Exception:
        pass
tree.write('output.xml', xml_declaration=True, method='xml', encoding="utf-16") 
Answered By: user3349907

Just a note: Python deprecated the .getiterator() a few versions ago so you’ll likely see an AttributeError if you use it now.

.iter() is the correct method to call with the xml code referenced in above snippets.

Answered By: Zayn Patel
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.