How to upload a file in FastAPI and convert it into a Pandas Dataframe?
Question:
I would like to upload a file to a FastAPI backend and convert it into a Pandas DataFrame. However, I don’t seem to understand how to do that using FastAPI’s UploadFile
object. More specifically, what should I pass to the pd.read_csv()
function?
Here is my FastAPI endpoint:
@app.post("/upload")
async def upload_file(file: UploadFile):
df = pd.read_csv("")
print(df)
return {"filename": file.filename}
Answers:
Below are given various options on how to convert an uploaded file to FastAPI into a Pandas DataFrame. If you would also like to convert the DataFrame into JSON and return it to the client, have a look at this answer. If you would like to use an async def
endpoint instead of def
, please have a look at this answer on how to read the file contents in an async
way, as well as this answer to understand the difference between using def
and async def
. It would also be best to enclose the I/O operations (in the examples below) in a try-except-finally
block (as shown here and here), so that you can catch/raise any possible exceptions and close
the file
properly, in order to release the object from memory and avoid potential errors.
Option 1
Since pandas.read_csv()
can accept a file-like
object, you can pass the file-like
object of UploadFile
directly. UploadFile
exposes an actual Python SpooledTemporaryFile
that you can get using the .file
attribute. Example is given below. Note: The pd.read_csv()
isn’t an async
method, and hence, if you are about to use async def
endpoint, it would be better to read the contents of the file using an async
method, as described here, and then pass the contents to pd.read_csv()
using one of the reamining options below. Alternatively, you can use Starlette’s run_in_threadpool()
(as described here), which will run the pd.read_csv(file.file)
in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked.
from fastapi import FastAPI, File, UploadFile
import pandas as pd
app = FastAPI()
@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
df = pd.read_csv(file.file)
file.file.close()
return {"filename": file.filename}
Option 2
Convert the bytes into a string and then load it into an in-memory text buffer (i.e., StringIO
), which can be converted into a dataframe:
from fastapi import FastAPI, File, UploadFile
import pandas as pd
from io import StringIO
app = FastAPI()
@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
contents = file.file.read()
s = str(contents,'utf-8')
data = StringIO(s)
df = pd.read_csv(data)
data.close()
file.file.close()
return {"filename": file.filename}
Option 3
Use an in-memory bytes buffer instead (i.e., BytesIO
), thus saving you the step of converting the bytes into a string as shown in Option 2:
from fastapi import FastAPI, File, UploadFile
import pandas as pd
from io import BytesIO
import uvicorn
app = FastAPI()
@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
contents = file.file.read()
data = BytesIO(contents)
df = pd.read_csv(data)
data.close()
file.file.close()
return {"filename": file.filename}
I would like to upload a file to a FastAPI backend and convert it into a Pandas DataFrame. However, I don’t seem to understand how to do that using FastAPI’s UploadFile
object. More specifically, what should I pass to the pd.read_csv()
function?
Here is my FastAPI endpoint:
@app.post("/upload")
async def upload_file(file: UploadFile):
df = pd.read_csv("")
print(df)
return {"filename": file.filename}
Below are given various options on how to convert an uploaded file to FastAPI into a Pandas DataFrame. If you would also like to convert the DataFrame into JSON and return it to the client, have a look at this answer. If you would like to use an async def
endpoint instead of def
, please have a look at this answer on how to read the file contents in an async
way, as well as this answer to understand the difference between using def
and async def
. It would also be best to enclose the I/O operations (in the examples below) in a try-except-finally
block (as shown here and here), so that you can catch/raise any possible exceptions and close
the file
properly, in order to release the object from memory and avoid potential errors.
Option 1
Since pandas.read_csv()
can accept a file-like
object, you can pass the file-like
object of UploadFile
directly. UploadFile
exposes an actual Python SpooledTemporaryFile
that you can get using the .file
attribute. Example is given below. Note: The pd.read_csv()
isn’t an async
method, and hence, if you are about to use async def
endpoint, it would be better to read the contents of the file using an async
method, as described here, and then pass the contents to pd.read_csv()
using one of the reamining options below. Alternatively, you can use Starlette’s run_in_threadpool()
(as described here), which will run the pd.read_csv(file.file)
in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked.
from fastapi import FastAPI, File, UploadFile
import pandas as pd
app = FastAPI()
@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
df = pd.read_csv(file.file)
file.file.close()
return {"filename": file.filename}
Option 2
Convert the bytes into a string and then load it into an in-memory text buffer (i.e., StringIO
), which can be converted into a dataframe:
from fastapi import FastAPI, File, UploadFile
import pandas as pd
from io import StringIO
app = FastAPI()
@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
contents = file.file.read()
s = str(contents,'utf-8')
data = StringIO(s)
df = pd.read_csv(data)
data.close()
file.file.close()
return {"filename": file.filename}
Option 3
Use an in-memory bytes buffer instead (i.e., BytesIO
), thus saving you the step of converting the bytes into a string as shown in Option 2:
from fastapi import FastAPI, File, UploadFile
import pandas as pd
from io import BytesIO
import uvicorn
app = FastAPI()
@app.post("/upload")
def upload_file(file: UploadFile = File(...)):
contents = file.file.read()
data = BytesIO(contents)
df = pd.read_csv(data)
data.close()
file.file.close()
return {"filename": file.filename}