Cannot scrape glassdoor rating's stars
Question:
Answers:
Like Dan explained in the comments, you can find the css (.css-1ihykkv
) applied in the class. Within this css, you’ll get the linear-gradient
in background
attribute. Also you can find the percentage of green and grey color used for rating.
Check screen shot here:
Once you find this css and its attribute you can extract the percentage data. Sharing an example below on how to extract .css
data:
bgColor = driver.findElement(By.xpath("//button[contains(@class,'btn-primary')]")).getCssValue("background-color")
print bgColor
The output should be like this:
rgba(0, 123, 255, 1)
Try extracting the data from the background
attribute and you can use the percentage data for different ratings like Culture & Values etc.
I found out that always the same css class is used for the same number of stars. For example the the css class for four stars is "css-94nhxw" and for one star is "css-1mfncox". I used this in my code to find out which of the classes is used for which rating.
For example my code for the subrating worklife balance looks like this:
def scrape_work_life_balance(gdReview):
try:
gdReview.find_element(By.XPATH, './/span [not@class ="SVGInline d-flex css-hcqxoa"]').text
return "Null"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-11w4osi e1hd5jg10"]').text
return "5"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-94nhxw e1hd5jg10"]').text
return "4"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-k58126 e1hd5jg10"]').text
return "3"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-1lp3h8x e1hd5jg10"]').text
return "2"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-1mfncox e1hd5jg10"]').text
return "1"
except:
return "Zero"
Just keep in mind that css classes can change. Glasdoor just changed them in the last months and I updated them today.
Like Dan explained in the comments, you can find the css (.css-1ihykkv
) applied in the class. Within this css, you’ll get the linear-gradient
in background
attribute. Also you can find the percentage of green and grey color used for rating.
Check screen shot here:
Once you find this css and its attribute you can extract the percentage data. Sharing an example below on how to extract .css
data:
bgColor = driver.findElement(By.xpath("//button[contains(@class,'btn-primary')]")).getCssValue("background-color")
print bgColor
The output should be like this:
rgba(0, 123, 255, 1)
Try extracting the data from the background
attribute and you can use the percentage data for different ratings like Culture & Values etc.
I found out that always the same css class is used for the same number of stars. For example the the css class for four stars is "css-94nhxw" and for one star is "css-1mfncox". I used this in my code to find out which of the classes is used for which rating.
For example my code for the subrating worklife balance looks like this:
def scrape_work_life_balance(gdReview):
try:
gdReview.find_element(By.XPATH, './/span [not@class ="SVGInline d-flex css-hcqxoa"]').text
return "Null"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-11w4osi e1hd5jg10"]').text
return "5"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-94nhxw e1hd5jg10"]').text
return "4"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-k58126 e1hd5jg10"]').text
return "3"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-1lp3h8x e1hd5jg10"]').text
return "2"
except:
try:
gdReview.find_element(By.XPATH, './/div [@class="tooltipContainer"]/div [@class="content"]/ul[@class="pl-0"]/li/div[text()="Work/Life Balance"]/following-sibling::div[@class="css-1mfncox e1hd5jg10"]').text
return "1"
except:
return "Zero"
Just keep in mind that css classes can change. Glasdoor just changed them in the last months and I updated them today.