Using "return" function instead of "print" in a scraper

Question

In my script below if I take out “return” statement and place there “print” then I get all the results. However, If i run it as it is, i get only the first item. My question is how I can get all the results using “return” in this case, I meant, what should be the process?

Here is the script:

import requests
from lxml import html

main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"

def abacus_scraper(main_link):
    tree = html.fromstring(requests.get(main_link).text)
    for titles in tree.cssselect("a.issuesInYear"):
        title = titles.cssselect("span")[0].text
        title_link = titles.attrib['href']
        return title, title_link

print(abacus_scraper(main_link))

Result:

('2017 - Volume 53 Abacus', '/journal/10.1111/(ISSN)1467-6281/issues?activeYear=2017')

Asked By: SIM

||

Source

Answer 1

As soon as you return from a function, you exit the for loop.

You should keep a list inside abacus, and append to the list on each iteration. After the loop is finished, then return the list.

For example:

import requests
from lxml import html

main_link = "http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-6281/issues"

def abacus_scraper(main_link):
    results = []
    tree = html.fromstring(requests.get(main_link).text)
    for titles in tree.cssselect("a.issuesInYear"):
        title = titles.cssselect("span")[0].text
        title_link = titles.attrib['href']
        results.append([title, title_link])
    return results

print(abacus_scraper(main_link))

Answered By: Solaxun

Using "return" function instead of "print" in a scraper

Question:

Answers: