Boto3 – Disable automatic multipart upload

Question:

I’m using a S3 compatible backend that it doesn’t support MultipartUpload.

I have a estrange case in which some servers when I upload a file, it finish ok but in other servers boto3 automatically try to upload the file using MultipartUpload. The file I’m trying to upload is exactly the same file for testing purposes to the same backend, region/tenant, bucket etc…

As documentation show, MultipartUpload is auto enabled when it’s needed:

  • Automatically switching to multipart transfers when a file is over a specific size threshold

Here are some logs when it switches automatically to MultipartUpload:

Log when automatically switches to MultipartUpload:

DEBUG:botocore.hooks:Event request-created.s3.CreateMultipartUpload: calling handler <function enable_upload_callbacks at 0x2b001b8>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [POST]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"POST /cassandra/samplefile.tgz?uploads HTTP/1.1" 501 None
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 09:12:48 GMT', 'transfer-encoding': 'chunked', 'content-type': 'application/xml;charset=UTF-8', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
<?xml version='1.0' encoding='UTF-8'?>
<Error>
  <Code>NotImplemented</Code>
  <Message>The request requires functionality that is not implemented in the current release</Message>
  <RequestId>1450429968948</RequestId>
  <HostId>aGRpLmJvc3RoY3AuY2xvdWQuY29ycDoyNg==</HostId>
</Error>     
DEBUG:botocore.hooks:Event needs-retry.s3.CreateMultipartUpload: calling handler <botocore.retryhandler.RetryHandler object at 0x2a490d0>

Log that do not switches to multipart, from other server but for the same file:

DEBUG:botocore.hooks:Event request-created.s3.PutObject: calling handler <function enable_upload_callbacks at 0x7f436c025500>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [PUT]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.awsrequest:Waiting for 100 Continue response.
DEBUG:botocore.awsrequest:100 Continue response seen, now sending request body.
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"PUT /cassandra/samplefile.tgz HTTP/1.1" 200 0
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 10:05:25 GMT', 'content-length': '0', 'etag': '"b407e71de028fe62fd9f2f799e606855"', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:

DEBUG:botocore.hooks:Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f436be1ecd0>
DEBUG:botocore.retryhandler:No retry needed.

I’m uploading the file as follows:

connection = boto3.client(service_name='s3',
        region_name='',
        api_version=None,
        use_ssl=True,
        verify=True,
        endpoint_url=url,
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        aws_session_token=None,
        config=None)
connection.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')

The questions are:

  • In order to avoid automatic switching to a multipart upload, how can
    I disable the MultipartUpload by default or increase the threshold ?
  • is there any reason for one server use automatic multipart and others not using the same file?
Asked By: RuBiCK

||

Answers:

I found a workaround, increasing the threshold size using S3Transfer and Transferconfig as follows:

myconfig = TransferConfig(

    multipart_threshold=9999999999999999, # workaround for 'disable' auto multipart upload
    max_concurrency=10,
    num_download_attempts=10,
)

connection = boto3.client(service_name='s3',
        region_name='',
        api_version=None,
        use_ssl=True,
        verify=True,
        endpoint_url=url,
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        aws_session_token=None,
        config=None)
transfer=S3Transfer(connection,myconfig)

transfer.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')

I hope it helps to someone

Answered By: RuBiCK

When i was looking about boto3, came across your question

Automatically switching to multipart transfers when a file is over a
specific size threshold??

Yes upload_file(either from client/resource/S3Transfer) will Automatically convert into multipart upload, by default threshold size is 8 MB.

if you don’t want MultiPart then never use upload_file method just use put_object method which will not Multipart.

client = boto3.client(‘s3’)

client.put_object(Body=open(‘/test.csv’),Bucket=’mybucket’,Key=’test.csv’)

Answered By: Samarendra

Yes, the Minimum Part size for multipart upload is by default 5 MiB (see S3-compatible MinIO server code).

But this setting is freely customizable on the client side, and in case of MinIO servers (which have larger globalMaxObjectSize), it can be increased even up to 5 TiB.

Using python minio client (connected either to an S3 or a MinIO server) we can customize Minimum Part with the part_size argument of fput_object, like this:

# default setting of `globalMinPartSize` is 5 MiB:
# multipart_size_bytes = 5 * (1024)**2

# but here we increase it 10-fold:
multipart_size_bytes = 50 * (1024)**2

# and then we can upload a 50 MiB file to a S3 / minio bucket in one chunk
minio_client.fput_object([..],
                         part_size=multipart_size_bytes,
                         [..])
Answered By: mirekphd
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.