Convert Python ElementTree to string
Question:
Whenever I call ElementTree.tostring(e)
, I get the following error message:
AttributeError: 'Element' object has no attribute 'getroot'
Is there any other way to convert an ElementTree object into an XML string?
TraceBack:
Traceback (most recent call last):
File "Development/Python/REObjectSort/REObjectResolver.py", line 145, in <module>
cm = integrateDataWithCsv(cm, csvm)
File "Development/Python/REObjectSort/REObjectResolver.py", line 137, in integrateDataWithCsv
xmlstr = ElementTree.tostring(et.getroot(),encoding='utf8',method='xml')
AttributeError: 'Element' object has no attribute 'getroot'
Answers:
Element
objects have no .getroot()
method. Drop that call, and the .tostring()
call works:
xmlstr = ElementTree.tostring(et, encoding='utf8', method='xml')
You only need to use .getroot()
if you have an ElementTree
instance.
Other notes:
-
This produces a bytestring, which in Python 3 is the bytes
type.
If you must have a str
object, you have two options:
-
Decode the resulting bytes value, from UTF-8: xmlstr.decode("utf8")
-
Use encoding='unicode'
; this avoids an encode / decode cycle:
xmlstr = ElementTree.tostring(et, encoding='unicode', method='xml')
-
If you wanted the UTF-8 encoded bytestring value or are using Python 2, take into account that ElementTree doesn’t properly detect utf8
as the standard XML encoding, so it’ll add a <?xml version='1.0' encoding='utf8'?>
declaration. Use utf-8
or UTF-8
(with a dash) if you want to prevent this. When using encoding="unicode"
no declaration header is added.
How do I convert ElementTree.Element
to a String?
For Python 3:
xml_str = ElementTree.tostring(xml, encoding='unicode')
For Python 2:
xml_str = ElementTree.tostring(xml, encoding='utf-8')
Example usage (Python 3)
from xml.etree import ElementTree
xml = ElementTree.Element("Person", Name="John")
xml_str = ElementTree.tostring(xml, encoding='unicode')
print(xml_str)
Output:
<Person Name="John" />
Explanation
ElementTree.tostring()
returns a bytestring by default in Python 2 & 3. This is an issue because Python 3 switched to using Unicode for strings.
In Python 2 you could use the str
type for both text and binary data.
Unfortunately this confluence of two different concepts could lead to
brittle code which sometimes worked for either kind of data, sometimes
not. […]
To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.
Source: Porting Python 2 Code to Python 3
If you know what version of Python is being used, you should specify the encoding as unicode
or utf-8
. For reference, I’ve included a comparison of .tostring()
results between Python 2 and Python 3.
ElementTree.tostring(xml)
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />
ElementTree.tostring(xml, encoding='unicode')
# Python 3: <Person Name="John" />
# Python 2: LookupError: unknown encoding: unicode
ElementTree.tostring(xml, encoding='utf-8')
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />
ElementTree.tostring(xml).decode()
# Python 3: <Person Name="John" />
# Python 2: <Person Name="John" />
Note: While xml_str = ElementTree.tostring().decode()
is compatible with both Python 2 & 3, Christopher Rucinski pointed out that this method fails when dealing with non-Latin characters).
Thanks to Martijn Peters for pointing out that the str
datatype changed between Python 2 and 3.
Why not use str()?
In most scenarios, using str()
would be the "canonical" way to convert an object to a string. However, using str()
with Element
returns the object’s location in memory as a hexstring, rather than a string representation of the object’s data.
from xml.etree import ElementTree
xml = ElementTree.Element("Person", Name="John")
print(str(xml)) # <Element 'Person' at 0x00497A80>
Non-Latin Answer Extension
Extension to @Stevoisiak’s answer and dealing with non-Latin characters. Only one way will display the non-Latin characters to you. The one method is different on both Python 3 and Python 2.
Input
xml = ElementTree.fromstring('<Person Name="크리스" />')
xml = ElementTree.Element("Person", Name="크리스") # Read Note about Python 2
NOTE: In Python 2, when calling the toString(...)
code, assigning xml
with ElementTree.Element("Person", Name="크리스")
will raise an error…
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 0: ordinal not in range(128)
Output
ElementTree.tostring(xml)
# Python 3 (크리스): b'<Person Name="크리스" />'
# Python 3 (John): b'<Person Name="John" />'
# Python 2 (크리스): <Person Name="크리스" />
# Python 2 (John): <Person Name="John" />
ElementTree.tostring(xml, encoding='unicode')
# Python 3 (크리스): <Person Name="크리스" /> <-------- Python 3
# Python 3 (John): <Person Name="John" />
# Python 2 (크리스): LookupError: unknown encoding: unicode
# Python 2 (John): LookupError: unknown encoding: unicode
ElementTree.tostring(xml, encoding='utf-8')
# Python 3 (크리스): b'<Person Name="xedx81xacxebxa6xacxecx8axa4" />'
# Python 3 (John): b'<Person Name="John" />'
# Python 2 (크리스): <Person Name="크리스" /> <-------- Python 2
# Python 2 (John): <Person Name="John" />
ElementTree.tostring(xml).decode()
# Python 3 (크리스): <Person Name="크리스" />
# Python 3 (John): <Person Name="John" />
# Python 2 (크리스): <Person Name="크리스" />
# Python 2 (John): <Person Name="John" />
If you just need this for debugging to see how the XML looks like, then instead of print(xml.etree.ElementTree.tostring(e))
you can use dump
like this:
xml.etree.ElementTree.dump(e)
And this works both with Element
and ElementTree
objects as e
, so there should be no need for getroot
.
The documentation of dump
says:
xml.etree.ElementTree.dump(elem)
Writes an element tree or element structure to sys.stdout
. This function should be used for debugging only.
The exact output format is implementation dependent. In this version, it’s written as an ordinary XML file.
elem
is an element tree or an individual element.
Changed in version 3.8: The dump()
function now preserves the attribute order specified by the user.
I had the same problem in Python 3.8 and none of the previous answers solved it. The issue is that ElementTree is both the name of a module and of a class within it. Using an alias makes it clear:
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as XET
...
ElementTree.tostring(...) # Attribute-error
XET.tostring(...) # Works
Input Sample File Content:
<?xml version="1.0" encoding="ISO-8859-1"?>
<UPDATE>
<DATA><SET_DOC ID="249865"/></DATA>
</UPDATE>
To String approach code for specific element:
import lxml.etree as ET
samplexml = ET.parse(r"D:sample.xml")
sampleroot = samplexml.getroot()
for dataElement in sampleroot.iter('DATA'):
updatext = ET.tostring(dataElement)
print(updatext)
Output:
b'<DATA><SET_DOC ID="249865"/></DATA>n'
Whenever I call ElementTree.tostring(e)
, I get the following error message:
AttributeError: 'Element' object has no attribute 'getroot'
Is there any other way to convert an ElementTree object into an XML string?
TraceBack:
Traceback (most recent call last):
File "Development/Python/REObjectSort/REObjectResolver.py", line 145, in <module>
cm = integrateDataWithCsv(cm, csvm)
File "Development/Python/REObjectSort/REObjectResolver.py", line 137, in integrateDataWithCsv
xmlstr = ElementTree.tostring(et.getroot(),encoding='utf8',method='xml')
AttributeError: 'Element' object has no attribute 'getroot'
Element
objects have no .getroot()
method. Drop that call, and the .tostring()
call works:
xmlstr = ElementTree.tostring(et, encoding='utf8', method='xml')
You only need to use .getroot()
if you have an ElementTree
instance.
Other notes:
-
This produces a bytestring, which in Python 3 is the
bytes
type.
If you must have astr
object, you have two options:-
Decode the resulting bytes value, from UTF-8:
xmlstr.decode("utf8")
-
Use
encoding='unicode'
; this avoids an encode / decode cycle:xmlstr = ElementTree.tostring(et, encoding='unicode', method='xml')
-
-
If you wanted the UTF-8 encoded bytestring value or are using Python 2, take into account that ElementTree doesn’t properly detect
utf8
as the standard XML encoding, so it’ll add a<?xml version='1.0' encoding='utf8'?>
declaration. Useutf-8
orUTF-8
(with a dash) if you want to prevent this. When usingencoding="unicode"
no declaration header is added.
How do I convert ElementTree.Element
to a String?
For Python 3:
xml_str = ElementTree.tostring(xml, encoding='unicode')
For Python 2:
xml_str = ElementTree.tostring(xml, encoding='utf-8')
Example usage (Python 3)
from xml.etree import ElementTree
xml = ElementTree.Element("Person", Name="John")
xml_str = ElementTree.tostring(xml, encoding='unicode')
print(xml_str)
Output:
<Person Name="John" />
Explanation
ElementTree.tostring()
returns a bytestring by default in Python 2 & 3. This is an issue because Python 3 switched to using Unicode for strings.
In Python 2 you could use the
str
type for both text and binary data.
Unfortunately this confluence of two different concepts could lead to
brittle code which sometimes worked for either kind of data, sometimes
not. […]To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.
Source: Porting Python 2 Code to Python 3
If you know what version of Python is being used, you should specify the encoding as unicode
or utf-8
. For reference, I’ve included a comparison of .tostring()
results between Python 2 and Python 3.
ElementTree.tostring(xml)
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />
ElementTree.tostring(xml, encoding='unicode')
# Python 3: <Person Name="John" />
# Python 2: LookupError: unknown encoding: unicode
ElementTree.tostring(xml, encoding='utf-8')
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />
ElementTree.tostring(xml).decode()
# Python 3: <Person Name="John" />
# Python 2: <Person Name="John" />
Note: While xml_str = ElementTree.tostring().decode()
is compatible with both Python 2 & 3, Christopher Rucinski pointed out that this method fails when dealing with non-Latin characters).
Thanks to Martijn Peters for pointing out that the str
datatype changed between Python 2 and 3.
Why not use str()?
In most scenarios, using str()
would be the "canonical" way to convert an object to a string. However, using str()
with Element
returns the object’s location in memory as a hexstring, rather than a string representation of the object’s data.
from xml.etree import ElementTree
xml = ElementTree.Element("Person", Name="John")
print(str(xml)) # <Element 'Person' at 0x00497A80>
Non-Latin Answer Extension
Extension to @Stevoisiak’s answer and dealing with non-Latin characters. Only one way will display the non-Latin characters to you. The one method is different on both Python 3 and Python 2.
Input
xml = ElementTree.fromstring('<Person Name="크리스" />')
xml = ElementTree.Element("Person", Name="크리스") # Read Note about Python 2
NOTE: In Python 2, when calling the
toString(...)
code, assigningxml
withElementTree.Element("Person", Name="크리스")
will raise an error…
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 0: ordinal not in range(128)
Output
ElementTree.tostring(xml)
# Python 3 (크리스): b'<Person Name="크리스" />'
# Python 3 (John): b'<Person Name="John" />'
# Python 2 (크리스): <Person Name="크리스" />
# Python 2 (John): <Person Name="John" />
ElementTree.tostring(xml, encoding='unicode')
# Python 3 (크리스): <Person Name="크리스" /> <-------- Python 3
# Python 3 (John): <Person Name="John" />
# Python 2 (크리스): LookupError: unknown encoding: unicode
# Python 2 (John): LookupError: unknown encoding: unicode
ElementTree.tostring(xml, encoding='utf-8')
# Python 3 (크리스): b'<Person Name="xedx81xacxebxa6xacxecx8axa4" />'
# Python 3 (John): b'<Person Name="John" />'
# Python 2 (크리스): <Person Name="크리스" /> <-------- Python 2
# Python 2 (John): <Person Name="John" />
ElementTree.tostring(xml).decode()
# Python 3 (크리스): <Person Name="크리스" />
# Python 3 (John): <Person Name="John" />
# Python 2 (크리스): <Person Name="크리스" />
# Python 2 (John): <Person Name="John" />
If you just need this for debugging to see how the XML looks like, then instead of print(xml.etree.ElementTree.tostring(e))
you can use dump
like this:
xml.etree.ElementTree.dump(e)
And this works both with Element
and ElementTree
objects as e
, so there should be no need for getroot
.
The documentation of dump
says:
xml.etree.ElementTree.dump(elem)
Writes an element tree or element structure to
sys.stdout
. This function should be used for debugging only.The exact output format is implementation dependent. In this version, it’s written as an ordinary XML file.
elem
is an element tree or an individual element.Changed in version 3.8: The
dump()
function now preserves the attribute order specified by the user.
I had the same problem in Python 3.8 and none of the previous answers solved it. The issue is that ElementTree is both the name of a module and of a class within it. Using an alias makes it clear:
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as XET
...
ElementTree.tostring(...) # Attribute-error
XET.tostring(...) # Works
Input Sample File Content:
<?xml version="1.0" encoding="ISO-8859-1"?>
<UPDATE>
<DATA><SET_DOC ID="249865"/></DATA>
</UPDATE>
To String approach code for specific element:
import lxml.etree as ET
samplexml = ET.parse(r"D:sample.xml")
sampleroot = samplexml.getroot()
for dataElement in sampleroot.iter('DATA'):
updatext = ET.tostring(dataElement)
print(updatext)
Output:
b'<DATA><SET_DOC ID="249865"/></DATA>n'