Keep only an element of a webpage while web-scraping

Question:

I am trying to extract a table from a webpage with python. I managed to get all the contents inside of that table, but since I am very new to webscrapping I don’t know how to keep only the elements that I am looking for.

I know that I should look for this class in the code: <a class="_3BFvyrImF3et_ZF21Xd8SC", which specify the items in the table.

So how can I keep only those classes to then extract the title of them?

<a class="_3BFvyrImF3et_ZF21Xd8SC" title="r/Python" href="/r/Python/">r/Python</a>
<a class="_3BFvyrImF3et_ZF21Xd8SC" title="r/Java" href="/r/Java/">r/Java</a>

I miserably failed in writing a code for that. I don’t know how I could extract only these classes, so any inputs will be highly appreciated.

Asked By: Raphael Borges

||

Answers:

Okay, I have made a very simple thing that worked.

Basically I pasted the code on VSCODE and the selected all the occurrences of that class. Then I just had to copy and paste in another file. Not sure why the shortcut CTRL + Shift + L did not work, but I have managed to get what I needed.

Select all occurrences of selected word in VSCode

Answered By: Raphael Borges

To extract the value of title attributes you can use list comprehension and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a._3BFvyrImF3et_ZF21Xd8SC[title]")])
    
  • Using XPATH:

    print([my_elem.get_attribute("title") for my_elem in driver.find_elements(By.XPATH, "//a[@class='_3BFvyrImF3et_ZF21Xd8SC' and @title]")])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
Answered By: undetected Selenium
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.