Get the experience section of a linkedin profile with selenium and python

Question:

I’m trying to get information from the experience and Education section.

For example of this linkedin profile:https://www.linkedin.com/in/kendra-tyson/

I want to get the information of all the experience section and education section.

For now I’ve been working on the Experience section. I want to get the container that has all the different jobs in the Experience sections I can iterate through the container and get the individual jobs (ie,
Talent Acquisition & Human Resources Manager, Technical Recruiter)

I’m using find elements by xpath with selenium but it times out/doesn’t find the xpath.

   experience = wait.until(EC.visibility_of_all_elements_located((By.XPATH, make_xpath_experience)))

The xpaths that I have tried are :

make_xpath_experience = "//div[@id='experience']/div[.//h2[text()='Experience']]//ul[contains(@class, 'pvs-list')]"
make_xpath_experience = "//section[@id='experience']//li[contains(@class, 'pvs-list__outer-container')]"

and I also tried CSS selector per this stackoverflow question with the updated information as the parameters used in that answer are no longer available: Linkedin Webscrape w Selenium

experience = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#experience . pvs-list__outer-container')))

I also tried following this geeksforgeeks tutorial with beautifulsoup (https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/) but the information is outdated and does not work.

How can I target the Experience section of the profile and then be able to extract the individual jobs and information (ie. full time, timeline, location)?

Asked By: Jesper Ezra

||

Answers:

The following code create a dictionary and populate it with jobs name, company, date, location and description.

from selenium.common.exceptions import NoSuchElementException
exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
    exp['job']     += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
    exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
    exp['date']    += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
    try:
        exp['location'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"]:nth-child(4) span').text]
    except NoSuchElementException:
        exp['location'] += ['*missing value*']
    try:
        exp['description'] += [job.find_element(By.CSS_SELECTOR, 'ul li ul span[aria-hidden=true]').text]
    except NoSuchElementException:
        exp['description'] += ['*missing value*']

import pandas as pd
pd.DataFrame(exp)

enter image description here

and if you want you can export the table in a csv file.

Update 3

Using javascript we can avoid using try except blocks. If location or description is missing, there will be None instead of *missing value*, as you can see from image below.

exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
    exp['job']     += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
    exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
    exp['date']    += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
    exp['location']    += [driver.execute_script('return arguments[0].querySelector("span[class='t-14 t-normal t-black--light']:nth-child(4) span")?.innerText', job)]
    exp['description'] += [driver.execute_script('return arguments[0].querySelector("ul li ul span[aria-hidden=true]")?.innerText', job)]

enter image description here

Answered By: sound wave
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.