Python: adding xml schema attributes with lxml
Question:
I’ve written a script that prints out all the .xml files in the current directory in xml format, but I can’t figure out how to add the encoding=’utf-8′?>
<databaseChangeLog
encoding=’utf-8′?>
<databaseChangeLog>
<include file="cats.xml"/>
<include file="dogs.xml"/>
<include file="fish.xml"/>
<include file="meerkats.xml"/>
</databaseChangLog>
Here is my script:
import lxml.etree
import lxml.builder
import glob
E = lxml.builder.ElementMaker()
ROOT = E.databaseChangeLog
DOC = E.include
# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]
the_doc = ROOT(*files)
str = lxml.etree.tostring(the_doc, pretty_print=True, xml_declaration=True, encoding='utf-8')
print str
I’ve found some examples online of explicitly setting namespace attributes, here and here, but to be honest they went a little over my head as I am just starting out. Is there another way to add these xmlns attributes to the databaseChangeLog tag?
Answers:
import lxml.etree as ET
import lxml.builder
import glob
dbchangelog = 'http://www.host.org/xml/ns/dbchangelog'
xsi = 'http://www.host.org/2001/XMLSchema-instance'
E = lxml.builder.ElementMaker(
nsmap={
None: dbchangelog,
'xsi': xsi})
ROOT = E.databaseChangeLog
DOC = E.include
# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]
the_doc = ROOT(*files)
the_doc.attrib['{{{pre}}}schemaLocation'.format(pre=xsi)] = 'www.host.org/xml/ns/dbchangelog'
print(ET.tostring(the_doc,
pretty_print=True, xml_declaration=True, encoding='utf-8'))
yields
<?xml version='1.0' encoding='utf-8'?>
<databaseChangeLog xsi_schemaLocation="www.host.org/xml/ns/dbchangelog">
<include file="test.xml"/>
</databaseChangeLog>
I’ve written a script that prints out all the .xml files in the current directory in xml format, but I can’t figure out how to add the encoding=’utf-8′?>
<databaseChangeLog
encoding=’utf-8′?>
<databaseChangeLog>
<include file="cats.xml"/>
<include file="dogs.xml"/>
<include file="fish.xml"/>
<include file="meerkats.xml"/>
</databaseChangLog>
Here is my script:
import lxml.etree
import lxml.builder
import glob
E = lxml.builder.ElementMaker()
ROOT = E.databaseChangeLog
DOC = E.include
# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]
the_doc = ROOT(*files)
str = lxml.etree.tostring(the_doc, pretty_print=True, xml_declaration=True, encoding='utf-8')
print str
I’ve found some examples online of explicitly setting namespace attributes, here and here, but to be honest they went a little over my head as I am just starting out. Is there another way to add these xmlns attributes to the databaseChangeLog tag?
import lxml.etree as ET
import lxml.builder
import glob
dbchangelog = 'http://www.host.org/xml/ns/dbchangelog'
xsi = 'http://www.host.org/2001/XMLSchema-instance'
E = lxml.builder.ElementMaker(
nsmap={
None: dbchangelog,
'xsi': xsi})
ROOT = E.databaseChangeLog
DOC = E.include
# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]
the_doc = ROOT(*files)
the_doc.attrib['{{{pre}}}schemaLocation'.format(pre=xsi)] = 'www.host.org/xml/ns/dbchangelog'
print(ET.tostring(the_doc,
pretty_print=True, xml_declaration=True, encoding='utf-8'))
yields
<?xml version='1.0' encoding='utf-8'?>
<databaseChangeLog xsi_schemaLocation="www.host.org/xml/ns/dbchangelog">
<include file="test.xml"/>
</databaseChangeLog>