urllib: specifying download path makes url invalid

Question:

I am trying to write a function that takes a url and a path and downloads a file to that path IF it’s a text file.

import urllib
import re
import os


mcBethURL = 'https://ia802707.us.archive.org/1/items/macbeth02264gut/0ws3410.txt'

def  download_file(url, path, local_filename):
    try:
        url_type = urllib.request.urlopen(url).info()['content-type']
        if bool(re.search('t[e]*xt', url_type)):
            local_filename = url.split('/')[-1]
            location = os.path.join("/{}/{}".format(path, local_filename))
            urllib.request.urlretrieve(url, path, filename=local_filename)
        else:
            print('No text file found at given URL, download aborted!')
    # some more exceptions here yet not relevant
    except:
        print('invalid url')

download_file(mcBethURL, '/home/wilma/PycharmProjects/Uni', 'mcBeth')

urllib.request.urlretrieve(url, path, filename=local_filename) doesn’t work since it prints invalid url yet urllib.request.urlretrieve(url, filename=local_filename) works yet I can not specify a path. I inserted the path parameter looking at How to download to a specific directory?

Do have an idea why I can not urlretrieve specifying a path variable and a name for the file in which the download should be saved in?

Asked By: Wilma

||

Answers:

So looking at this What command to use instead of urllib.request.urlretrieve? it looks like urllib.request.urlretrieve is on the outs and you might consider using shutil.copyfileobj or requests.get. From looking at the docs. This example seems relevant for the legacy interface you are using.

import urllib.request
local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()

In the docs urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None) does not have a second positional argument so it is being ignored in your code.

Answered By: kpie
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.