Creating URLs in a loop

Question:

I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve.

for i, j in zip(range(0, 17), range(1, 18)):
    if i < 8 or j < 10:
        url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"
        print(url)
    if i == 9 and j == 10:
        url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls"
        print(url)
    if i > 9:
        if i > 9 or j < 8:
            url = "https://Here is a URL/P20{}".format(i) + "-{}".format(j) + ".xls"
            print(url)

Output of above code is:

https://Here is a URL/P2000-01.xls
https://Here is a URL/P2001-02.xls
https://Here is a URL/P2002-03.xls
https://Here is a URL/P2003-04.xls
https://Here is a URL/P2004-05.xls
https://Here is a URL/P2005-06.xls
https://Here is a URL/P2006-07.xls
https://Here is a URL/P2007-08.xls
https://Here is a URL/P2008-09.xls
https://Here is a URL/P2009-10.xls
https://Here is a URL/P2010-11.xls
https://Here is a URL/P2011-12.xls
https://Here is a URL/P2012-13.xls
https://Here is a URL/P2013-14.xls
https://Here is a URL/P2014-15.xls
https://Here is a URL/P2015-16.xls
https://Here is a URL/P2016-17.xls

But this:

url

gives only:

'https://Here is a URL/P2016-17.xls'

How do I get all the URLs, not just the final one?

Asked By: FinickyBee

||

Answers:

You are overriding the results of the URL with final URL. you need to maintain a final list and keep adding new values to the list

import urllib.parse
url=[];
for i,j in zip(range(0,17),range(1,18)):
    if(i<8 or j<10):
        url.append("https://Here is a URL/P200{}".format(i)+"-0{}".format(j)+".xls")
    if(i==9 and  j==10):
        url.append("https://Here is a URL/P200{}".format(i)+"-{}".format(j)+".xls") 
    if(i>9):
        if((i>9) or (j<8)):
            url.append("https://Here is a URL/P20{}".format(i)+"-{}".format(j)+".xls")

for urlValue in url:
            print(urllib.parse.quote(urlValue))
Answered By: sunilbaba

There are several things that could significantly simplify your code. First of all, this:

"https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"

could be simplified to this:

"https://Here is a URL/P200{}-0{}.xls".format(i, j)

And if you have at least Python 3.6, you could use an f-string instead:

f"https://Here is a URL/P200{i}-0{j}.xls"

Second of all, Python has several ways to conveniently pad numbers with zeroes; it can even be done as part of the f-string formatting. Additionally, range starts from zero by default.

So your entire original code is equivalent to:

for num in range(17):
    print(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

Now, you want to actually use these URLs, not just print them out. You mentioned building a list, which can be done like so:

urls = []
for num in range(17):
    urls.append(f'https://Here is a URL/P20{num:02}-{num+1:02}.xls')

Or with a list comprehension:

urls = [f'https://Here is a URL/P20{num:02}-{num+1:02}.xls'
        for num in range(17)]

Based on your comments here and on your other question, you seem to be confused about what form you need these URLs to be in. Strings like this are already what you need. urlretrieve accepts the URL as a string, so you don’t need to do any further processing. See the example in the docs:

local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()

However, I would recommend not using urlretrieve, for two reasons.

  1. As the documentation mentions, urlretrieve is a legacy method that may become deprecated. If you’re going to use urllib, use the urlopen method instead.

  2. However, as Paul Becotte mentioned in an answer to your other question: if you’re looking to fetch URLs, I would recommend installing and using Requests instead of urllib. It’s more user-friendly.

Regardless of which method you choose, again, strings are fine. Here’s code that that uses Requests to download each of the specified spreadsheets to your current directory:

import requests

base_url = 'https://Here is a URL/'

for num in range(17):
    filename = f'P20{num:02}-{num+1:02}.xls'
    xls = requests.get(base_url + filename)
    with open(filename, 'wb') as f:
        f.write(xls.content)
Answered By: CrazyChucky