Web scraping using Python

Question

I’m trying to get data from a list of companies (currently testing only for one) from a website. I am not sure I can recognise how to get the score that I want because I can only find the formatting part instead of the actual data. Please could someone help?

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

driver=webdriver.Chrome(executable_path='C:webdriverschromedriver.exe')

driver.get('https://www.refinitiv.com/en/sustainable-finance/esg-scores')

driver.maximize_window()
time.sleep(1)

cookie= driver.find_element("xpath", '//button[@id="onetrust-accept-btn-handler"]')
try:
    cookie.click()
except:
    pass

company_name=driver.find_element("id",'searchInput-1')
company_name.click()
company_name.send_keys('Jumbo SA')
time.sleep(1)

search=driver.find_element("xpath", '//button[@class="SearchInput-searchButton"]')
search.click()
time.sleep(2)

company_score = driver.find_elements("xpath",'//div[@class="fiscal-year"]')

print(company_score)

That’s what I have so far. I want the number "42" to come back to my results but instead I get the below;

[<selenium.webdriver.remote.webelement.WebElement (session="bffa2fe80dd3785618b5c52d7087096d", element="62eaf2a8-d1a2-4741-8374-c0f970dfcbfe")>]

My issue is that the locator is not working.

//div[@class="fiscal-year"] = This part I think is wrong – but I am not sure what I need to pick from the website.

Website Screenshot

Asked By: ariadne

||

Source

Answer 1

Assuming your locators are correct (I did not test it), what you could do is:

company_score = driver.find_elements("xpath",'//div[@class="fiscal-year"]')
for element in company_score:
    print(element.text)

A few issues with your code:

you are not using waits, meaning that dynamic elements (those loading after page’ html skeleton loaded) will be difficult to locate
you are defining the score as an array: you cannot print the whole array text, you need to print individual elements one by one
after page loads with the results, a search after xpath //div[@class="fiscal-year"] brings no results; again, are you sure this locator is correct?

See documentation on Selenium Waits for more details.

Answered By: Barry the Platipus

Answer 2

Thank you so much for your help.
I’m sorry I was not that clear in my question I think (I’m quite new to this).

My issue is that the locator is not working.

//div[@class="fiscal-year"] = This part I think is wrong – but I am not sure what I need to pick from the website.

Website Screenshot

Answered By: ariadne

Answer 3

please use requests look at this example:

import requests

url = "https://www.refinitiv.com/bin/esg/esgsearchsuggestions"

payload = ""
response = requests.request("GET", url, data=payload)

print(response.text)

so this returns something like this:

[
{
        "companyName": "GEK TERNA Holdings Real Estate Construction SA",
        "ricCode": "HRMr.AT"
    },
    {
        "companyName": "Mytilineos SA",
        "ricCode": "MYTr.AT"
    },
    {
        "companyName": "Hellenic Telecommunications Organization SA",
        "ricCode": "OTEr.AT"
    },
    {
        "companyName": "Jumbo SA",
        "ricCode": "BABr.AT"
    },
    {
        "companyName": "Folli Follie Commercial Manufacturing and Technical SA",
        "ricCode": "HDFr.AT"
    },
    {
]

Here we can see the text and the code behind it so for Jumbo SA its BABr.AT. Now with this info lets get the data:

import requests

url = "https://www.refinitiv.com/bin/esg/esgsearchresult"

querystring = {"ricCode":"BABr.AT"} ## supply the company code

payload = ""
headers = {"cookie": "encaddr=NeVecfNa7%2FR1rLeYOqY57g%3D%3D"}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Now we see the response is in json:

{
    "industryComparison": {
        "industryType": "Specialty Retailers",
        "scoreYear": "2020",
        "rank": "162",
        "totalIndustries": "281"
    },
    "esgScore": {
        "TR.TRESGCommunity": {
            "score": 24,
            "weight": 0.13
        },
        "TR.TRESGInnovation": {
            "score": 9,
            "weight": 0.05
        },
        "TR.TRESGHumanRights": {
            "score": 31,
            "weight": 0.08
        },
        "TR.TRESGShareholders": {
            "score": 98,
            "weight": 0.08
        },
        "TR.SocialPillar": {
            "score": 43,
            "weight": 0.42999998
        },
        "TR.TRESGEmissions": {
            "score": 19,
            "weight": 0.08
        },
        "TR.TRESGManagement": {
            "score": 47,
            "weight": 0.26
        },
        "TR.GovernancePillar": {
            "score": 53,
            "weight": 0.38999998569488525
        },
        "TR.TRESG": {
            "score": 42,
            "weight": 1
        },
        "TR.TRESGWorkforce": {
            "score": 52,
            "weight": 0.1
        },
        "TR.EnvironmentPillar": {
            "score": 20,
            "weight": 0.19
        },
        "TR.TRESGResourceUse": {
            "score": 30,
            "weight": 0.06
        },
        "TR.TRESGProductResponsibility": {
            "score": 62,
            "weight": 0.12
        },
        "TR.TRESGCSRStrategy": {
            "score": 17,
            "weight": 0.05
        }
    }
}

Now you can get the data you want without using selenium. This way its faster, easier and better.

Please accept this as an answer.

Answered By: d-dutchview

Web scraping using Python

Question:

Answers: