Get the experience section of a linkedin profile with selenium and python
Question:
I’m trying to get information from the experience and Education section.
For example of this linkedin profile:https://www.linkedin.com/in/kendra-tyson/
I want to get the information of all the experience section and education section.
For now I’ve been working on the Experience section. I want to get the container that has all the different jobs in the Experience sections I can iterate through the container and get the individual jobs (ie,
Talent Acquisition & Human Resources Manager, Technical Recruiter)
I’m using find elements by xpath with selenium but it times out/doesn’t find the xpath.
experience = wait.until(EC.visibility_of_all_elements_located((By.XPATH, make_xpath_experience)))
The xpaths that I have tried are :
make_xpath_experience = "//div[@id='experience']/div[.//h2[text()='Experience']]//ul[contains(@class, 'pvs-list')]"
make_xpath_experience = "//section[@id='experience']//li[contains(@class, 'pvs-list__outer-container')]"
and I also tried CSS selector per this stackoverflow question with the updated information as the parameters used in that answer are no longer available: Linkedin Webscrape w Selenium
experience = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#experience . pvs-list__outer-container')))
I also tried following this geeksforgeeks tutorial with beautifulsoup (https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/) but the information is outdated and does not work.
How can I target the Experience section of the profile and then be able to extract the individual jobs and information (ie. full time, timeline, location)?
Answers:
The following code create a dictionary and populate it with jobs name, company, date, location and description.
from selenium.common.exceptions import NoSuchElementException
exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
exp['job'] += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
exp['date'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
try:
exp['location'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"]:nth-child(4) span').text]
except NoSuchElementException:
exp['location'] += ['*missing value*']
try:
exp['description'] += [job.find_element(By.CSS_SELECTOR, 'ul li ul span[aria-hidden=true]').text]
except NoSuchElementException:
exp['description'] += ['*missing value*']
import pandas as pd
pd.DataFrame(exp)
and if you want you can export the table in a csv file.
Update 3
Using javascript we can avoid using try except blocks. If location or description is missing, there will be None
instead of *missing value*
, as you can see from image below.
exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
exp['job'] += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
exp['date'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
exp['location'] += [driver.execute_script('return arguments[0].querySelector("span[class='t-14 t-normal t-black--light']:nth-child(4) span")?.innerText', job)]
exp['description'] += [driver.execute_script('return arguments[0].querySelector("ul li ul span[aria-hidden=true]")?.innerText', job)]
I’m trying to get information from the experience and Education section.
For example of this linkedin profile:https://www.linkedin.com/in/kendra-tyson/
I want to get the information of all the experience section and education section.
For now I’ve been working on the Experience section. I want to get the container that has all the different jobs in the Experience sections I can iterate through the container and get the individual jobs (ie,
Talent Acquisition & Human Resources Manager, Technical Recruiter)
I’m using find elements by xpath with selenium but it times out/doesn’t find the xpath.
experience = wait.until(EC.visibility_of_all_elements_located((By.XPATH, make_xpath_experience)))
The xpaths that I have tried are :
make_xpath_experience = "//div[@id='experience']/div[.//h2[text()='Experience']]//ul[contains(@class, 'pvs-list')]"
make_xpath_experience = "//section[@id='experience']//li[contains(@class, 'pvs-list__outer-container')]"
and I also tried CSS selector per this stackoverflow question with the updated information as the parameters used in that answer are no longer available: Linkedin Webscrape w Selenium
experience = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#experience . pvs-list__outer-container')))
I also tried following this geeksforgeeks tutorial with beautifulsoup (https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/) but the information is outdated and does not work.
How can I target the Experience section of the profile and then be able to extract the individual jobs and information (ie. full time, timeline, location)?
The following code create a dictionary and populate it with jobs name, company, date, location and description.
from selenium.common.exceptions import NoSuchElementException
exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
exp['job'] += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
exp['date'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
try:
exp['location'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"]:nth-child(4) span').text]
except NoSuchElementException:
exp['location'] += ['*missing value*']
try:
exp['description'] += [job.find_element(By.CSS_SELECTOR, 'ul li ul span[aria-hidden=true]').text]
except NoSuchElementException:
exp['description'] += ['*missing value*']
import pandas as pd
pd.DataFrame(exp)
and if you want you can export the table in a csv file.
Update 3
Using javascript we can avoid using try except blocks. If location or description is missing, there will be None
instead of *missing value*
, as you can see from image below.
exp = {key:[] for key in ['job','company','date','location','description']}
jobs = driver.find_elements(By.CSS_SELECTOR, 'section:has(#experience)>div>ul>li')
for job in jobs:
exp['job'] += [job.find_element(By.CSS_SELECTOR, 'span[class="mr1 t-bold"] span').text]
exp['company'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal"] span').text]
exp['date'] += [job.find_element(By.CSS_SELECTOR, 'span[class="t-14 t-normal t-black--light"] span').text]
exp['location'] += [driver.execute_script('return arguments[0].querySelector("span[class='t-14 t-normal t-black--light']:nth-child(4) span")?.innerText', job)]
exp['description'] += [driver.execute_script('return arguments[0].querySelector("ul li ul span[aria-hidden=true]")?.innerText', job)]