lxml library not adding newlines or indentation to tree after adding new element
Question:
The title is self explanatory and before tagging this as duplicate please consider that I have checked this answer and it does not work for me because I don’t even get the correct format in sys.stdout
not only when writing to file. So I have the following xml (test.xml):
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
child_2.text = '2016-07-29T12:00:00'
child_3.text = '1'
for i in [child_1, child_2, child_3]:
field.append(i)
a.append(field)
s = etree.tostring(tree, pretty_print=True)
print(s.decode('utf-8'))
OUTPUT
<soap:Envelope xmlns_soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
<Field_1><FieldName>dateTime</FieldName><FieldValue>2016-07-29T12:00:00</FieldValue><FieldIndex>1</FieldIndex></Field_1></DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
EXPECTED
<soap:Envelope xmlns_soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
<Field_1>
<FieldName>dateTime</FieldName>
<FieldValue>2016-07-29T12:00:00</FieldValue>
<FieldIndex>1</FieldIndex>
</Field_1>
</DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
I really do not understand why new field I am adding is not formatted as supposed to, because if I print only field
, everything looks fine:
s = etree.tostring(root, pretty_print=True)
print(s.decode('utf-8'))
#<Field_1 xmlns="http://www." xmlns_soap="http://www...">
# <FieldName>dateTime</FieldName>
# <FieldValue>2016-07-29T12:00:00</FieldValue>
# <FieldIndex>1</FieldIndex>
#</Field_1>
NOTE: I am using python 3.4 (this is the reason why I have to .decode('utf-8')
otherwise I just get byte literals).
Answers:
It works if you add this line after a = get_data_fields()
:
a.text = None
lxml cannot always determine what whitespace is ignorable, so in some cases the whitespace needs to be removed manually.
See http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output:
If you want to be sure all blank text is removed from an XML document (or just more blank text than the parser does by itself), you have to use either a DTD to tell the parser which whitespace it can safely ignore, or remove the ignorable whitespace manually after parsing, e.g. by setting all tail text to None:
The title is self explanatory and before tagging this as duplicate please consider that I have checked this answer and it does not work for me because I don’t even get the correct format in sys.stdout
not only when writing to file. So I have the following xml (test.xml):
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
child_2.text = '2016-07-29T12:00:00'
child_3.text = '1'
for i in [child_1, child_2, child_3]:
field.append(i)
a.append(field)
s = etree.tostring(tree, pretty_print=True)
print(s.decode('utf-8'))
OUTPUT
<soap:Envelope xmlns_soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
<Field_1><FieldName>dateTime</FieldName><FieldValue>2016-07-29T12:00:00</FieldValue><FieldIndex>1</FieldIndex></Field_1></DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
EXPECTED
<soap:Envelope xmlns_soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
<Field_1>
<FieldName>dateTime</FieldName>
<FieldValue>2016-07-29T12:00:00</FieldValue>
<FieldIndex>1</FieldIndex>
</Field_1>
</DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
I really do not understand why new field I am adding is not formatted as supposed to, because if I print only field
, everything looks fine:
s = etree.tostring(root, pretty_print=True)
print(s.decode('utf-8'))
#<Field_1 xmlns="http://www." xmlns_soap="http://www...">
# <FieldName>dateTime</FieldName>
# <FieldValue>2016-07-29T12:00:00</FieldValue>
# <FieldIndex>1</FieldIndex>
#</Field_1>
NOTE: I am using python 3.4 (this is the reason why I have to .decode('utf-8')
otherwise I just get byte literals).
It works if you add this line after a = get_data_fields()
:
a.text = None
lxml cannot always determine what whitespace is ignorable, so in some cases the whitespace needs to be removed manually.
See http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output:
If you want to be sure all blank text is removed from an XML document (or just more blank text than the parser does by itself), you have to use either a DTD to tell the parser which whitespace it can safely ignore, or remove the ignorable whitespace manually after parsing, e.g. by setting all tail text to None: