Extract images and text from a sequence of urls

Question:

I’m trying to make an script to extract images and text from a sequence of urls. The urls are from the same website but with differents parameters. Reading StackOverflow and another sites I have "created" a script that works, but I have a problem when I try to make a sequence.

I tryed to use a while to make a loop and determine that if the option input is "1" the script have to make a range (00001, 00002…) and then apply it to an url (http://example.com/page/00001). Everything works (even the image and text extraction) but it only extract from one url. I have tried to make a list or something else but I haven’t gotten it.

Here is the code that works but only extract from one:

def getUrl(opt, baseUrl):
    out_folder = "/monedasWiki/monedas"
    print "Instrucciones del script n No te preocupes, no es complicado pero atiende a los pasos"
    print "Introduce 1 para obtener los archivos del 00001 al 00010"
    print "Introduce 2 para obtener los archivos del 00010 al 00099"
    print "Introduce 3 para obtener los archivos del 00100 al 00999"
    print "Introduce 4 para obtener los archivos del 01000 al 09999"
    print "Introduce 5 para obtener los archivos del 10000 al 19999"
    optSel = int(input(opt))
    # i es el rango
    # urlI es la transformacion de i en cadena
    # baseUrl es el enlace al sitio web de Pliego
    # url es la url completa con los parametros necesarios
    while True:
        if optSel == 1:
            try:
                for i in range(0,10):
                    r = str(0).zfill(4)
                    urlI = str(i)
                    print r + urlI # it's only to verify that works fine.
                    url = baseUrl + r + urlI
            except ValueError:
                print "Introduce el rango correcto"
                continue
        elif optSel == 2:
            try:
                for i in range(10,100):
                    r = str(0).zfill(3)
                    urlI = str(i)
                    print r + urlI # it's only to verify that works fine.
                    url = baseUrl + r + urlI
            except ValueError:
                print "Introduce el rango correcto"
                continue
        elif optSel < 0:
            print "Valor inferior a 0"
            continue
        else:
            print "Algo ha salido mal"
            break

        main(url, out_folder)

I just put two elif to make the code shortest. If you could point me where or what it’s the mistake and what could I do to make what I want, I’ll be thankful.

Asked By: Ivanhercaz

||

Answers:

You have to move the line below inside for loops:

main(url, out_folder)

That is, something like this:

while True:
    if optSel == 1:
        try:
            for i in range(0,10):
                r = str(0).zfill(4)
                urlI = str(i)
                print r + urlI
                url = baseUrl + r + urlI
                main(url, out_folder)
        except ValueError:
            print "Introduce el rango correcto"
            continue
Answered By: j4hangir

Depending on what main() does, something like this:

def getUrl(opt, baseUrl):
    out_folder = "/monedasWiki/monedas"
    print "Instrucciones del script n No te preocupes, no es complicado pero atiende a los pasos"
    print "Introduce 1 para obtener los archivos del 00001 al 00010"
    print "Introduce 2 para obtener los archivos del 00010 al 00099"
    print "Introduce 3 para obtener los archivos del 00100 al 00999"
    print "Introduce 4 para obtener los archivos del 01000 al 09999"
    print "Introduce 5 para obtener los archivos del 10000 al 19999"
    optSel = int(input(opt))
    # i es el rango
    # urlI es la transformacion de i en cadena
    # baseUrl es el enlace al sitio web de Pliego
    # url es la url completa con los parametros necesarios
    if optSel == 1:
        try:
            for i in range(0,10):
                r = str(0).zfill(4)
                urlI = str(i)
                print r + urlI # it's only to verify that works fine.
                url = baseUrl + r + urlI
                main(url, out_folder)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel == 2:
        try:
            for i in range(10,100):
                r = str(0).zfill(3)
                urlI = str(i)
                print r + urlI # it's only to verify that works fine.
                url = baseUrl + r + urlI
                main(url, out_folder)
        except ValueError:
            print "Introduce el rango correcto"
    elif optSel < 0:
        print "Valor inferior a 0"
    else:
        print "Algo ha salido mal"
Answered By: Robert Moskal
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.