Python not able to save some CLOB xml into xml file
Question:
I have this xml saved as CLOB in my oracle database:
<?xml version="1.0" encoding="UTF-8"?>
<DCResponse>
...
</DCResponse>
and with this python code I am able to save content into xml file:
sql = "select extract(xmltype.createxml(xml), '//DCResponse').getStringVal() from table t where id = 2"
for row in cursor.execute(sql):
print(row[0])
with open("output.xml", "w") as f:
f.write(row[0])
Instead with this xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<PIPEDocument xsi_schemaLocation="urn:XML-PIPE
PIPEDocument.xsd" ReferenceNumber="567862650" CreationDate="20200115155255" Version="1.0"
>
...
</PIPEDocument>
I’m not able to extract the content with python. Write() argument must be str, not None…..is result in Python console running this code:
sql = "select extract(xmltype.createxml(xml), '//PIPEDocument').getStringVal() from table t where id
= 7"
for row in cursor.execute(sql):
print(row[0])
with open("output.xml", "w") as f:
f.write(row[0])
In my oracle client the output of the below sql query, used in python, is null:
select extract(xmltype.createxml(xml), '//PIPEDocument').getStringVal() from table t where id = 7;
while the xml content is present in my DB:
select xml from table where id =7
Not sure what is the issue, maybe the keyword ‘//PIPEDocument’ in the select query or different encoding between the 2 XML files, but no idea how to fix this.
Please Help
Best Regards
Giancarlo
Answers:
The problem is that in your second XML document you have namespaces. The element <PIPEDocument>
is in the namespace
, and your XPath expression //PIPEDocument'
only matches elements <PIPEDocument>
in the ‘default’ namespace.
If you want to use namespaces with the extract
function you have to add:
-
a namespace mapping of a prefix to the namespace URI, using the optional third argument to extract
. This third argument is a string formatted in the same way as
won’t work here. Instead, use a prefix, such as p
, or perhaps pipe
.
-
add the prefix to the XPath expression in the second argument to extract
.
I made these changes to your code, choosing to use the namespace prefix p
. I also wrapped the entire SQL string in triple-quotes to avoid me having to escape the quotes inside it. This gave me the following, which returned the desired XML output:
sql = """select extract(xmltype.createxml(xml), '//p:PIPEDocument', '').getStringVal() from table t where id = 7"""
for row in cursor.execute(sql):
print(row[0])
For query performance, you might want to fetch the LOB as a String, see the doc.
For general reference, if the database type had been XMLType, cx_Oracle’s recommendation is to use xmltype.getclobval()
to avoid limited XML lengths.
Update: The latest version of cx_Oracle, now called python-oracledb has a Thin mode that doesn’t have the XML query length limitation.
I have this xml saved as CLOB in my oracle database:
<?xml version="1.0" encoding="UTF-8"?>
<DCResponse>
...
</DCResponse>
and with this python code I am able to save content into xml file:
sql = "select extract(xmltype.createxml(xml), '//DCResponse').getStringVal() from table t where id = 2"
for row in cursor.execute(sql):
print(row[0])
with open("output.xml", "w") as f:
f.write(row[0])
Instead with this xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<PIPEDocument xsi_schemaLocation="urn:XML-PIPE
PIPEDocument.xsd" ReferenceNumber="567862650" CreationDate="20200115155255" Version="1.0"
>
...
</PIPEDocument>
I’m not able to extract the content with python. Write() argument must be str, not None…..is result in Python console running this code:
sql = "select extract(xmltype.createxml(xml), '//PIPEDocument').getStringVal() from table t where id
= 7"
for row in cursor.execute(sql):
print(row[0])
with open("output.xml", "w") as f:
f.write(row[0])
In my oracle client the output of the below sql query, used in python, is null:
select extract(xmltype.createxml(xml), '//PIPEDocument').getStringVal() from table t where id = 7;
while the xml content is present in my DB:
select xml from table where id =7
Not sure what is the issue, maybe the keyword ‘//PIPEDocument’ in the select query or different encoding between the 2 XML files, but no idea how to fix this.
Please Help
Best Regards
Giancarlo
The problem is that in your second XML document you have namespaces. The element <PIPEDocument>
is in the namespace , and your XPath expression
//PIPEDocument'
only matches elements <PIPEDocument>
in the ‘default’ namespace.
If you want to use namespaces with the extract
function you have to add:
-
a namespace mapping of a prefix to the namespace URI, using the optional third argument to
extract
. This third argument is a string formatted in the same way aswon’t work here. Instead, use a prefix, such as
p
, or perhapspipe
. -
add the prefix to the XPath expression in the second argument to
extract
.
I made these changes to your code, choosing to use the namespace prefix p
. I also wrapped the entire SQL string in triple-quotes to avoid me having to escape the quotes inside it. This gave me the following, which returned the desired XML output:
sql = """select extract(xmltype.createxml(xml), '//p:PIPEDocument', '').getStringVal() from table t where id = 7"""
for row in cursor.execute(sql):
print(row[0])
For query performance, you might want to fetch the LOB as a String, see the doc.
For general reference, if the database type had been XMLType, cx_Oracle’s recommendation is to use xmltype.getclobval()
to avoid limited XML lengths.
Update: The latest version of cx_Oracle, now called python-oracledb has a Thin mode that doesn’t have the XML query length limitation.