How to upload a file in FastAPI and convert it into a Pandas Dataframe?

Question:

I would like to upload a file to a FastAPI backend and convert it into a Pandas DataFrame. However, I don’t seem to understand how to do that using FastAPI’s UploadFile object. More specifically, what should I pass to the pd.read_csv() function?

Here is my FastAPI endpoint:

@app.post("/upload")
async def upload_file(file: UploadFile):
    df = pd.read_csv("")
    print(df)
    return {"filename": file.filename}
Asked By: Abhishek Vashishth

||

Answers:

Below are given various options on how to convert an uploaded file to FastAPI into a Pandas DataFrame. If you would also like to convert the DataFrame into JSON and return it to the client, have a look at this answer. If you would like to use an async def endpoint instead of def, please have a look at this answer on how to read the file contents in an async way, as well as this answer to understand the difference between using def and async def. It would also be best to enclose the I/O operations (in the examples below) in a try-except-finally block (as shown here and here), so that you can catch/raise any possible exceptions and close the file properly, in order to release the object from memory and avoid potential errors.

Option 1

Since pandas.read_csv() can accept a file-like object, you can pass the file-like object of UploadFile directly. UploadFile exposes an actual Python SpooledTemporaryFile that you can get using the .file attribute. Example is given below. Note: The pd.read_csv() isn’t an async method, and hence, if you are about to use async def endpoint, it would be better to read the contents of the file using an async method, as described here, and then pass the contents to pd.read_csv() using one of the reamining options below. Alternatively, you can use Starlette’s run_in_threadpool() (as described here), which will run the pd.read_csv(file.file) in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked.

from fastapi import FastAPI, File, UploadFile
import pandas as pd

app = FastAPI()

@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
    df = pd.read_csv(file.file)
    file.file.close()
    return {"filename": file.filename}

Option 2

Convert the bytes into a string and then load it into an in-memory text buffer (i.e., StringIO), which can be converted into a dataframe:

from fastapi import FastAPI, File, UploadFile
import pandas as pd
from io import StringIO

app = FastAPI()

@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
    contents = file.file.read()
    s = str(contents,'utf-8')
    data = StringIO(s) 
    df = pd.read_csv(data)
    data.close()
    file.file.close()
    return {"filename": file.filename}

Option 3

Use an in-memory bytes buffer instead (i.e., BytesIO), thus saving you the step of converting the bytes into a string as shown in Option 2:

from fastapi import FastAPI, File, UploadFile
import pandas as pd
from io import BytesIO
import uvicorn

app = FastAPI()

@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
    contents = file.file.read()
    data = BytesIO(contents)
    df = pd.read_csv(data)
    data.close()
    file.file.close()
    return {"filename": file.filename}
Answered By: Chris
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.