lxml not performing xslt transform

Question:

With this code:

from lxml import etree

with open( 'C:\Python33\projects\xslt', 'r' ) as xslt, open( 'C:\Python33\projects\result', 'a+' ) as result, open( 'C:\Python33\projects\xml', 'r' ) as xml:
    s_xml = xml.read()
    s_xslt = xslt.read()
    transform = etree.XSLT(etree.XML(s_xslt))
    out = transform(etree.XML(s_xml))
    result.write(out)

I get this error:

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
from projects.xslt_transform import trans
  File ".projectsxslt_transform.py", line 17, in <module>
transform = etree.XSLT(etree.XML(s_xslt))
  File "xslt.pxi", line 409, in lxml.etree.XSLT.__init__ (srclxmllxml.etree.c:150256)
lxml.etree.XSLTParseError: Invalid expression

this couple xml/xslt files works with other tools.

Also I had to get rid of the encoding attribute in the top declarations for both files in order not to get:

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

can it be related ?

EDIT:

this does not work either (i get the same error):

with open( 'C:\Python33\projects\xslt', 'r',encoding="utf-8" ) as xslt, open( 'C:\Python33\projects\result', 'a+',encoding="utf-8" ) as result, open( 'C:\Python33\projects\xml', 'r',encoding="utf-8" ) as xml:
    s_xml = etree.parse(BytesIO(bytes(xml.read(),'UTF-8')))
    s_xslt = etree.parse(BytesIO(bytes(xslt.read(),'UTF-8')))
    transform = etree.XSLT(s_xslt)
    out = transform(s_xml)
    print(out.tostring())

reading lxml source code: this returns an exception:

xslt.xsltParseStylesheetDoc(c_doc)

so it seems an actual parse error. Can it be namespace related ?

EDIT SOLVED:

s_xml = etree.parse(xml.read())
s_xslt = etree.parse(xslt.read())

thanks tomalak

Asked By: user2346536

||

Answers:

Parsing XML is more complicated than “open a text file, stuff the resulting string into etree”.

XML files are serialized representations of a DOM tree. They are not to be handled as text even though they come in the shape of a text file. They come in multiple byte encodings and finding out which encoding a certain file uses is anything but trivial.

XML parsers have proper detection mechanisms built in and therefore they should be used to open XML files. The the basic open() + read() calls are not enough to correctly handle the file contents.

lxml.etree provides the parse() function that can accept a number of argument types:

  • an open file object (make sure to open it in binary mode)
  • a file-like object that has a .read(byte_count) method returning a byte string on each call
  • a filename string
  • an HTTP or FTP URL string

and then will correctly parse the associated document back into a DOM tree.

Your code should look more like this:

from lxml import etree

f_xsl = 'C:\Python33\projects\xslt'
f_xml = 'C:\Python33\projects\xml'
f_out = 'C:\Python33\projects\result'

transform = etree.XSLT(etree.parse(f_xsl))
result = transform(etree.parse(f_xml))
result.write(f_out)
Answered By: Tomalak

Exceptions in package lxmlxslt.pxi can indicate fault with xslt template, detailed errors-and-messages (lxml.de) can be extracted from the.error_log array.

Answered By: dank8

Detailed error messages that occur when lxmlxslt.pxi attempts xslt template transform, are stored in property array transform.error_log. See: errors-and-messages (lxml.de).

Answered By: dank8
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.