How to create website update tool that checks whether the link contained in a specific button has changed?

Question:

I’m very new to python and I’m trying to create a website update tool that checks whether the link contained in a specific button has changed.

This is the code I have used:

import requests
from bs4 import BeautifulSoup


url = 'https://www.keldagroup.com/investors/creditor-considerations/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')

urls = []


for link in soup.find_all('a'):
   print(link.get('href'))

However it produces a long string like this:

/about-kelda-group/
/about-kelda-group/kelda-group-vision-and-values/
/about-kelda-group/group-profile/
/about-kelda-group/sustainability-and-corporate-social-responsibility/
/about-kelda-group/group-profile/chief-executive-statement/
None

The url I want stays in the same place on the website. How do I choose the url from the string I’ve produced? I can then write some code to see if this has changed.

If you know of a simpler way to solve my issue please let me know.

Asked By: van10

||

Answers:

You can do something like this (obviously, adapt it to your desired output/ alert etc):

import requests
from bs4 import BeautifulSoup
import schedule
import time


def get_link():
    url = 'https://www.keldagroup.com/investors/creditor-considerations/'
    reqs = requests.get(url)
    soup = BeautifulSoup(reqs.text, 'html.parser')

    the_ever_changing_link = soup.find('a', {'arial-label': 'Investor presentation'}).get('href')
    return the_ever_changing_link

def handle_changes_to_link(value):
    new_link = get_link()
    if new_link == value:
        print('link is the same')
    else:
        print('link has changed: ', new_link)
        
schedule.every(10).seconds.do(handle_changes_to_link, value = 'https://www.keldagroup.com/media/1399/yw-investor-update-1-dec-2021.pdf')
while True:
    schedule.run_pending()
    time.sleep(1)

This will check every 10 seconds (you may wanna change that hours/days) if the link has changed its original value (hardcoded as ‘https://www.keldagroup.com/media/1399/yw-investor-update-1-dec-2021.pdf’) and will print out:

link is the same
link is the same
link is the same
link is the same
link is the same
link is the same
link is the same
link is the same
link is the same
[...]

If the link changes, it will printout link has changed, along with the new link.

Documentation for schedule is at: https://schedule.readthedocs.io/en/stable/

Also, docs for BeautifulSoup(bs4) can be found at: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Answered By: platipus_on_fire