AttributeError: __enter__ while passing .xml via HTTP Post to pd.read_xml()

Question:

I’m using python pandas and flask for some postprocessing tasks (anlaysis and visualization). Until now I uploaded/read *.csv *.xlsx and *.xls via pd.read_csv, pd.read_xlsx. Everything worked quiet fine.

Now I have a *.xml file as datasource and tried according my habit pattern.

So i tried:

<form action="/input" method="POST" enctype="multipart/form-data">
<input class="form-control" type="file"  name="file">
<input type="submit" class="btn btn-outline-secondary" name="Preview"  value ="Preview Data" > </input>

from flask import Flask, render_template,request, render_template
import pandas as pd
import xml.etree.ElementTree as ET

@app.route("/input", methods=['POST', 'GET'])
def input():
        if request.method == 'POST':
            if request.form['Preview'] == "Preview Data":
                file = request.files['file']
                filename = file.filename
                if '.xml' in filename:
                     content = pd.read_xml(file, parser='lxml')

But when I pass a .xml file to the app via the form. I get the error:

File "C:ProgramDataMiniforgeEnvsTestEnvlibsite-packagespandasioxml.py", line 627, in _parse_doc
    with preprocess_data(handle_data) as xml_data:
AttributeError: __enter__

I tried check different options:

  1. when I use the inbuild xml.etree package it works fine:
import xml.etree.ElementTree as ET

if '.xml' in filename:
    tree = ET.parse(file)
    root = tree.getroot()  
    print(root[1][0][1].attrib)


  1. when I load the .xml direct from the app directory into pd.read_xml() it also works fine:
if '.xml' in filename:
    
    content = pd.read_xml('SampleExport.xml', parser='lxml')
  1. I tried different prasers: "lxml" and "etree"

But at the end when I pass the .xml via the Form/input and using pd.read_xml(file,parser=’lxml’) I got the error from above.

Asked By: To0bias

||

Answers:

I just solved my issue even though I’m not quite sure why pd.read_xml() behaves different compared to pd.read_csv() or pd.read_xlsx().

pd.read_xml is not able to read a FileStorage object. The variable passed by request.file[] is a instance of the class: werkzeug.datastructures.FileStorage(stream=None, filename=None, name=None, content_type=None, content_length=None, headers=None).

Via the read function I extracted the file itsself.

filestorage = request.files['file']
            file=filestorage.read()

with this passed to pd.read_xml it works fine.

Is there anybody who can explain why _parse_doc() funtion of pd.read_xml() is not able to read FileStotage type?

Answered By: To0bias