Why does using .decode() consume both instances of a file? Python Flask

Question:

I am retrieving a CSV file from an html form and decoding it with utf-8. I need two instances of this file in my program for various reasons but when I use .decode(‘utf-8’) both instances of the file are consumed by the .decode() function.

Python code:

if request.method == 'POST':
            #get the uploaded file
            file = request.files['file']
            file_copy = file
            bank_selection = request.form['banks']

            line_num = banks.get_line_no(file_copy)

for some reason the .decode() function in get_line_no consumes file and file_copy

def get_line_no(file):

        file_data = file.read().decode('utf-8')

        for line_num, row in enumerate(file_data.split('n')):
            if ',' in row and row.split(',')[0] == 'Date':
                break
        print(line_num)

        return line_num

For some reason this does not work because when I try

dataframe = pd.read_csv(file, skiprows=(line_num))

pandas returns an empty error because both the original file and file_copy have been consumed by .decode()

The only way I’ve gotten it to work is by getting the user to send two files in the html form and retrieve them into different variables:

file = request.files['file']
file_copy = request.files['file2']

Why is this happening?
Is there maybe a way to send two copies of the file from the html without the user having to input it twice?

Asked By: Nicholas Coetzee

||

Answers:

file_copy = file doesn’t create a new copy of the file or file handle, it just assigns the existing file handle referred to by file to file_copy. Once you read the file pointed to by file using file.read(), that also reads the file pointed to by file_copy, since it’s the same handle. Instead, just create a copy of the file data, e.g.:

if request.method == 'POST':
    #get the uploaded file
    file = request.files['file']
    file_data = file.read()
    bank_selection = request.form['banks']

    line_num = get_line_no(file_data)
    # after the function returns, file_data will still be availabe (and undecoded)

def get_line_no(file_data):
    decoded = file_data.decode('utf-8')

    for line_num, row in enumerate(file_data.split('n')):
        if ',' in row and row.split(',')[0] == 'Date':
            break
    print(line_num)

    return line_num

If you need to read the same file with .read_csv(), you can create a StringIO object to allow it to be read as a file:

from io import StringIO

dataframe = pd.read_csv(StringIO(file_data.decode()), skiprows=(line_num))
Answered By: Grismar
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.