Empty lines while using minidom.toprettyxml

Question

I’ve been using a minidom.toprettyxml for prettify my xml file.
When I’m creating XML file and using this method, all works grate, but if I use it after I’ve modified the xml file (for examp I’ve added an additional nodes) and then I’m writing it back to XML, I’m getting empty lines, each time I’m updating it, I’m getting more and more empty lines…

my code :

file.write(prettify(xmlRoot))


def prettify(elem):
    rough_string = xml.tostring(elem, 'utf-8') //xml as ElementTree
    reparsed = mini.parseString(rough_string) //mini as minidom
    return reparsed.toprettyxml(indent=" ")

and the result :

<?xml version="1.0" ?>
<testsuite errors="0" failures="3" name="TestSet_2013-01-23 14_28_00.510935" skip="0"     tests="3" time="142.695" timestamp="2013-01-23 14:28:00.515460">




    <testcase classname="TC test" name="t1" status="Failed" time="27.013"/>




    <testcase classname="TC test" name="t2" status="Failed" time="78.325"/>


    <testcase classname="TC test" name="t3" status="Failed" time="37.357"/>
</testsuite>

any suggestions ?

thanks.

Asked By: Igal

||

Source

Answer 1

I found a solution here: http://code.activestate.com/recipes/576750-pretty-print-xml/

Then I modified it to take a string instead of a file.

from xml.dom.minidom import parseString

pretty_print = lambda data: 'n'.join([line for line in parseString(data).toprettyxml(indent=' '*2).split('n') if line.strip()])

Output:

<?xml version="1.0" ?>
<testsuite errors="0" failures="3" name="TestSet_2013-01-23 14_28_00.510935" skip="0" tests="3" time="142.695" timestamp="2013-01-23 14:28:00.515460">
  <testcase classname="TC test" name="t1" status="Failed" time="27.013"/>
  <testcase classname="TC test" name="t2" status="Failed" time="78.325"/>
  <testcase classname="TC test" name="t3" status="Failed" time="37.357"/>
</testsuite>

This may help you work it into your function a little be easier:

def new_prettify():
    reparsed = parseString(CONTENT)
    print 'n'.join([line for line in reparsed.toprettyxml(indent=' '*2).split('n') if line.strip()])

Answered By: Joe

Answer 2

use this to resolve problem with the lines

toprettyxml(indent=' ', newl='r', encoding="utf-8")

Answered By: Giovani Hgo

Answer 3

I found an easy solution for this problem, just with changing the last line
of your prettify() so it will be:

def prettify(elem):
rough_string = xml.tostring(elem, 'utf-8') //xml as ElementTree
reparsed = mini.parseString(rough_string) //mini as minidom
return reparsed.toprettyxml(indent=" ", newl='')

Answered By: Sidali Smaili

Answer 4

I am having the same issue with Python 2.7 (32b) in a Windows 10 machine. The issue seems to be that when python parses an XML text to an ElementTree object, it adds some annoying line feeds to either the “text” or “tail” attributes of each element.

This script removes such line break characters:

def removeAnnoyingLines(elem):
    hasWords = re.compile("\w")
    for element in elem.iter():
        if not re.search(hasWords,str(element.tail)):
            element.tail=""
        if not re.search(hasWords,str(element.text)):
            element.text = ""

Use this function before “pretty-printing” your tree:

removeAnnoyingLines(element)
myXml = xml.dom.minidom.parseString(xml.etree.ElementTree.tostring(element))
print myXml.toprettyxml()

It worked for me. I hope it works for you!

Answered By: Ricardo Alejos

Answer 5

Here’s a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.

Answered By: Josh Correia

Answer 6

The problem is that minidom doesn’t handle well the new line chars (on Windows).
Anyway it doesn’t need them so removing them from the sting is the solution:

reparsed = mini.parseString(rough_string) //mini as minidom

replace with

reparsed = mini.parseString(rough_string.replace('n','')) //mini as minidom

But be aware that this is solution working only for Windows.

Answered By: DexBG

Answer 7

Since minidom toprettyxml insert too many lines, my solution was to delete lines that do not have useful data in them by checking if there is at least one ‘<‘ character (there may be a better idea). This worked perfectly for a similar issue I had (on Windows).

text = md.toprettyxml() # get the prettyxml string from minidom Document md
# text = text.replace('    ', 't') # for those using tabs :)
spl = text.split('n') # split lines into a list
spl = [i for i in spl if '<' in i] # keep only element with data inside
text = 'n'.join(spl) # join again all elements of the filtered list into a string

# write the result to file (I use codecs because I needed the utf-8 encoding)
import codecs # if not imported yet (just to show this import is needed)
with codecs.open('yourfile.xml', 'w', encoding='utf-8') as f:
    f.write(text)

Answered By: VKitsune

Empty lines while using minidom.toprettyxml

Question:

Answers: