How to repeat just a certain part of the function in python if a condition is met?
Question:
I am writing a web scraping script that does different things depending on what was scrapped from the website.
So far everything works. But sometimes the website loads a bit slower randomly and I would get the text "Loading" in my result.
If that happens, I want it to wait a few seconds and scrap again, then use the new results to do things under the else:
part of the function.
Problem is I have no idea how to make it work. I Googled a bit and it seems using while loop is the solution. But I cannot figure out how to implement it in my code (because the action I want it to repeat is inside the very same function?). Or is there a better way to do it?
Here is my code:
def Webscraping(page_url):
#Webscraping the URL from user input
def MakeTableFromData(input):
#input some data, print out a table made from data
def Work(page_url):
scrapped_text = Webscraping(page_url)
scrapped_string= ''.join(scrapped_text)
newlistings1 = scrapped_string.split('n')
#turn the scrapped text into a list
with open(r'C:UsersUserDocumentsNEW_LISTINGS1.json', 'w',encoding='UTF-8') as f:
json.dump(newlistings1, f, ensure_ascii = False)
if "Sorry but you need to complete the captcha test to continue" in newlistings1:
print("Captcha Test")
exit()
elif "No match" in newlistings1:
print("No listing currently")
elif "Loading" in newlistings1:
time.sleep(3)
#I don't know how to write this part
#Wait a bit and repeat the whole process of web scraping, then do everthing underneath else:
else:
# Do things with data
MakeTableFromData(newlistings1)
with open(r'C:UsersUserDocumentsold_listing1.json', 'w',encoding='UTF-8') as f:
json.dump(newest1, f, ensure_ascii = False)
print("Updated old_listing1.json")
page_url = input("Enter the link.")
Work(page_url)
Answers:
A simple way would be to do this recursively, in order to make sure that the ‘scrapping’ logic is always the same. To do so, you’d just have to call again your function Work
under the condition :
elif "Loading" in newlistings1:
time.sleep(3)
Work(page_url)
However, there is an associated risk : what if the webpage is broken and you always end in up in "Loading" case ? Then you’d have an infinite loop. To solve this there are many different ways. You could add a parameter count = 0
to the function work: def Work(page_url, count=0)
. Then, the first part of the body should be checking that count
is under a threshold (say, 3) and if not return false
(to break the infinite call loop). Then, you condition could call the function with Work(page_url, count + 1)
.
def Work(page_url, count=0):
if count > 3:
return
# do your things
elif "Loading" in newlistings1:
time.sleep(3)
Work(page_url, count+1)
This is just one way and probably not the best one, but it would work fine.
I am writing a web scraping script that does different things depending on what was scrapped from the website.
So far everything works. But sometimes the website loads a bit slower randomly and I would get the text "Loading" in my result.
If that happens, I want it to wait a few seconds and scrap again, then use the new results to do things under the else:
part of the function.
Problem is I have no idea how to make it work. I Googled a bit and it seems using while loop is the solution. But I cannot figure out how to implement it in my code (because the action I want it to repeat is inside the very same function?). Or is there a better way to do it?
Here is my code:
def Webscraping(page_url):
#Webscraping the URL from user input
def MakeTableFromData(input):
#input some data, print out a table made from data
def Work(page_url):
scrapped_text = Webscraping(page_url)
scrapped_string= ''.join(scrapped_text)
newlistings1 = scrapped_string.split('n')
#turn the scrapped text into a list
with open(r'C:UsersUserDocumentsNEW_LISTINGS1.json', 'w',encoding='UTF-8') as f:
json.dump(newlistings1, f, ensure_ascii = False)
if "Sorry but you need to complete the captcha test to continue" in newlistings1:
print("Captcha Test")
exit()
elif "No match" in newlistings1:
print("No listing currently")
elif "Loading" in newlistings1:
time.sleep(3)
#I don't know how to write this part
#Wait a bit and repeat the whole process of web scraping, then do everthing underneath else:
else:
# Do things with data
MakeTableFromData(newlistings1)
with open(r'C:UsersUserDocumentsold_listing1.json', 'w',encoding='UTF-8') as f:
json.dump(newest1, f, ensure_ascii = False)
print("Updated old_listing1.json")
page_url = input("Enter the link.")
Work(page_url)
A simple way would be to do this recursively, in order to make sure that the ‘scrapping’ logic is always the same. To do so, you’d just have to call again your function Work
under the condition :
elif "Loading" in newlistings1:
time.sleep(3)
Work(page_url)
However, there is an associated risk : what if the webpage is broken and you always end in up in "Loading" case ? Then you’d have an infinite loop. To solve this there are many different ways. You could add a parameter count = 0
to the function work: def Work(page_url, count=0)
. Then, the first part of the body should be checking that count
is under a threshold (say, 3) and if not return false
(to break the infinite call loop). Then, you condition could call the function with Work(page_url, count + 1)
.
def Work(page_url, count=0):
if count > 3:
return
# do your things
elif "Loading" in newlistings1:
time.sleep(3)
Work(page_url, count+1)
This is just one way and probably not the best one, but it would work fine.