Reuse function that validates file size [fastapi]

Question:

I’m very new to FastAPI.
I want to validate the file type and the file size of an uploaded file and raise Exception if it’s above the size and doesn’t match the type. This file will be uploaded to S3
This is what my code looks like

@router.post("/upload/", status_code=200, description="***** Upload customer document asset to S3 *****")
async def upload(
        document_type: DocumentEnum,
        customer_id: UUID,
        current_user=Depends(get_current_user),
        fileobject: UploadFile = File(...)
):
    # delete the file from memory and rollover to disk to save unnecessary memory space
    fileobject.file.rollover()
    fileobject.file.flush()

    valid_types = [
        'image/png',
        'image/jpeg',
        'image/bmp',
        'application/pdf'
    ]
    await validate_file(fileobject, 5000000, valid_types)

    # .... Proceed to upload file 

My validate_file function looks like this

async def validate_file(file: UploadFile, max_size: int = None, mime_types: list = None):
    """
    Validate a file by checking the size and mime types a.k.a file types
    """
    if mime_types and file.content_type not in mime_types:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="You can only upload pdf and image for document"
        )
    if max_size:
        size = await file.read()
        if len(size) > max_size:
            raise HTTPException(
                status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
                detail="File size is too big. Limit is 5mb"
            )

    return file

Now when the file gets uploaded to S3, it’s always 0 bytes in size.
However, if I exclude the file size checking from validate_file function, then the original file gets uploaded and there’s no problem.
If the validate_file function is like this, then it gets uploaded fine

async def validate_file(file: UploadFile, max_size: int = None, mime_types: list = None):
    """
    Validate a file by checking the size and mime types a.k.a file types
    """
    if mime_types and file.content_type not in mime_types:
        raise HTTPException(
            status_code=status.HTTP_400_BAD_REQUEST,
            detail="You can only upload pdf and image for document"
        )
            )

    return file

I’ve no idea why this happens. Thank you in advance for your help.

Asked By: Koushik Das

||

Answers:

When you call read on a file, the current file pointer will be at the end of what you read. When you (or the library) call read the second time, the internal file pointer will already be at the end of the file.

You can use await file.seek(0) to put the file pointer at the start of the file, so that the next read will read the same content again:

if max_size:
    size = await file.read()

    if len(size) > max_size:
        raise HTTPException(
            status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
            detail="File size is too big. Limit is 5mb"
        )

    await file.seek(0)

return file

You might also want to explicitly resolve the mime type of the file instead of trusting what the user says the file is – you can use mimetypes.guess_type or something similar to do that.

Answered By: MatsLindh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.