What is the Difference between file_upload() and put_object() when uploading files to S3 using boto3
Question:
I’m using boto3 and trying to upload files. It will be helpful if anyone will explain exact difference between file_upload()
and put_object()
s3 bucket methods in boto3 ?
- Is there any performance difference?
- Does anyone among these handles multipart upload feature in behind the scenes?
- What are the best use cases for both?
Answers:
The upload_file
method is handled by the S3 Transfer Manager, this means that it will automatically handle multipart uploads behind the scenes for you, if necessary.
The put_object
method maps directly to the low-level S3 API request. It does not handle multipart uploads for you. It will attempt to send the entire body in one request.
One other difference I feel might be worth noticing is upload_file() API allows you to track upload using callback function. You can check about it here.
Also as already mentioned by boto’s creater @garnaat that upload_file() uses multipart behind the scenes so its not straight forward to check end to end file integrity (there exists a way) but put_object() uploads whole file at one shot (capped at 5GB though) making it easier to check integrity by passing Content-MD5 which is already provided as a parameter in put_object() API.
One other thing to mention is that put_object()
requires a file object whereas upload_file()
requires the path of the file to upload. For example, if I have a json file already stored locally then I would use upload_file(Filename='/tmp/my_file.json', Bucket=my_bucket, Key='my_file.json')
.
Whereas if I had a dict within in my job, I could transform the dict into json and use put_object()
like so:
records_to_update = {'Name': 'Sally'}
records_to_update_json = json.dumps(records_to_update, default=str)
put_object(Body=records_to_update_json, Bucket=my_bucket, Key='my_records')
I’m using boto3 and trying to upload files. It will be helpful if anyone will explain exact difference between file_upload()
and put_object()
s3 bucket methods in boto3 ?
- Is there any performance difference?
- Does anyone among these handles multipart upload feature in behind the scenes?
- What are the best use cases for both?
The upload_file
method is handled by the S3 Transfer Manager, this means that it will automatically handle multipart uploads behind the scenes for you, if necessary.
The put_object
method maps directly to the low-level S3 API request. It does not handle multipart uploads for you. It will attempt to send the entire body in one request.
One other difference I feel might be worth noticing is upload_file() API allows you to track upload using callback function. You can check about it here.
Also as already mentioned by boto’s creater @garnaat that upload_file() uses multipart behind the scenes so its not straight forward to check end to end file integrity (there exists a way) but put_object() uploads whole file at one shot (capped at 5GB though) making it easier to check integrity by passing Content-MD5 which is already provided as a parameter in put_object() API.
One other thing to mention is that put_object()
requires a file object whereas upload_file()
requires the path of the file to upload. For example, if I have a json file already stored locally then I would use upload_file(Filename='/tmp/my_file.json', Bucket=my_bucket, Key='my_file.json')
.
Whereas if I had a dict within in my job, I could transform the dict into json and use put_object()
like so:
records_to_update = {'Name': 'Sally'}
records_to_update_json = json.dumps(records_to_update, default=str)
put_object(Body=records_to_update_json, Bucket=my_bucket, Key='my_records')