What is the Difference between file_upload() and put_object() when uploading files to S3 using boto3

Question:

I’m using boto3 and trying to upload files. It will be helpful if anyone will explain exact difference between file_upload() and put_object() s3 bucket methods in boto3 ?

  • Is there any performance difference?
  • Does anyone among these handles multipart upload feature in behind the scenes?
  • What are the best use cases for both?
Asked By: Tushar Niras

||

Answers:

The upload_file method is handled by the S3 Transfer Manager, this means that it will automatically handle multipart uploads behind the scenes for you, if necessary.

The put_object method maps directly to the low-level S3 API request. It does not handle multipart uploads for you. It will attempt to send the entire body in one request.

Answered By: garnaat

One other difference I feel might be worth noticing is upload_file() API allows you to track upload using callback function. You can check about it here.

Also as already mentioned by boto’s creater @garnaat that upload_file() uses multipart behind the scenes so its not straight forward to check end to end file integrity (there exists a way) but put_object() uploads whole file at one shot (capped at 5GB though) making it easier to check integrity by passing Content-MD5 which is already provided as a parameter in put_object() API.

Answered By: Pranav Gupta

One other thing to mention is that put_object() requires a file object whereas upload_file() requires the path of the file to upload. For example, if I have a json file already stored locally then I would use upload_file(Filename='/tmp/my_file.json', Bucket=my_bucket, Key='my_file.json').

Whereas if I had a dict within in my job, I could transform the dict into json and use put_object() like so:

records_to_update = {'Name': 'Sally'}
records_to_update_json = json.dumps(records_to_update, default=str)
put_object(Body=records_to_update_json, Bucket=my_bucket, Key='my_records')

Answered By: deesolie