Python GET Rest API – package is downloaded but I cannot open it (invalid)

Question:

I must run python to get some artifacts from repository in following syntax (invoked from batch with its variables) so this part to pass arguments is not changeable.

python get_artifacts.py %USERNAME%:%PASSWORD% http://url/artifactory/package.zip

My python script is the following:

import sys
import requests
from requests.auth import HTTPBasicAuth

def get_artifact(url, save_artifact_name, username, password, chunk_size=128):
    try:
        get_method = requests.get(url, 
                        auth = HTTPBasicAuth(username, password), stream=True)

        with open(save_artifact_name, 'wb') as artifact:
            for chunk in get_method.iter_content(chunk_size=chunk_size):
                artifact.write(chunk)

    except requests.exceptions.RequestException as error:
        sys.exit(str(error))

if __name__ == '__main__':

    username_and_password = sys.argv[1].split(':')
    username = username_and_password[0]
    password = username_and_password[1]

    url = sys.argv[2]
    save_artifact_name = url.split("/")[-1]

    print(f'Retrieving artifact {save_artifact_name}...')
    get_artifact(url, save_artifact_name, username, password)
    print("Finished successfully!")

Now I CAN see my package downloaded, but my zip package is invalid.
Of course with some other tool like curl.exe the same works.
So definitely I am missing something in python script but not able to determine what am I missing (download works but package is invalid).

Thanks a lot!

Asked By: AndreyS

||

Answers:

You’re streaming the file a few bytes at a time and writing each chunk to the file but writing the file a-new each time so I suspect you’re just seeing the last chunk in the file. Unless the file is hugely huge, you should be able to simply load the entire thing into memory then write it out. Here’s the modified version:

import sys
import requests
from requests.auth import HTTPBasicAuth

def get_artifact(url, save_artifact_name, username, password):
    try:
        get_method = requests.get(url,
            auth = HTTPBasicAuth(username, password))

        with open(save_artifact_name, 'wb') as artifact:
            artifact.write(get_method.content)

    except requests.exceptions.RequestException as error:
        sys.exit(str(error))

if __name__ == '__main__':

    username_and_password = sys.argv[1].split(':')
    username = username_and_password[0]
    password = username_and_password[1]

    url = sys.argv[2]
    save_artifact_name = url.split("/")[-1]

    print(f'Retrieving artifact {save_artifact_name}...')
    get_artifact(url, save_artifact_name, username, password)
    print("Finished successfully!")

That should fetch the entire file in one go and write it to your output. I’ve just tested this with a 5MB test file I found online and it downloaded just lovely.

The chunk size is no longer needed as you’re not downloading in chunks. 🙂

Answered By: Thickycat

Here’s an answer that is MUCH closer to the original, including the chunking that will work with minimal memory. It simply places the open() before the downloading code:

import sys
import requests
from requests.auth import HTTPBasicAuth

def get_artifact(url, save_artifact_name, username, password, chunk_size=128):
    try:

        with open(save_artifact_name, 'wb') as artifact:

            get_method = requests.get(url,
                        auth = HTTPBasicAuth(username, password), stream=True)
            for chunk in get_method.iter_content(chunk_size=chunk_size):
                artifact.write(chunk)

    except requests.exceptions.RequestException as error:
        sys.exit(str(error))

if __name__ == '__main__':

    username_and_password = sys.argv[1].split(':')
    username = username_and_password[0]
    password = username_and_password[1]

    url = sys.argv[2]
    save_artifact_name = url.split("/")[-1]

    print(f'Retrieving artifact {save_artifact_name}...')
    get_artifact(url, save_artifact_name, username, password)
    print("Finished successfully!")
Answered By: Thickycat
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.