Selenium Python can only find the first element

Question

This is a follow up question from yesterday that I posted here Selenium Python unable to find web element. I have been able to return the first post of every thread on this forum using the answer given in the link above, but I need to return the replies as well which I have been unable to do. This is the code that works so far for returning the first post in the thread found at the following link https://www.thestudentroom.co.uk/showthread.php?t=7263973

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver

url = "https://www.thestudentroom.co.uk/showthread.php?t=7263973"

def get_posts(url):
    options = Options()
    options.add_argument("--headless")
    options.headless = True
    driver = webdriver.Chrome()
    driver.maximize_window()
    wait = WebDriverWait(driver, 5)
    driver.get(url)
    posts = wait.until(EC.presence_of_element_located((By.XPATH, f"//div[@class='styles__PostContent-sc-1r7c0ap-3 kylDhV']/span")))
    print(posts.text)
    driver.quit()

SR_posts = get_posts(url = url)
SR_posts

To try and fetch the replies further down in that thread I have tried using the following:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver

url = "https://www.thestudentroom.co.uk/showthread.php?t=7263973"

def get_posts(url):
    options = Options()
    options.add_argument("--headless")
    options.headless = True
    driver = webdriver.Chrome()
    driver.maximize_window()
    wait = WebDriverWait(driver, 5)
    driver.get(url)
    #posts = wait.until(EC.presence_of_element_located((By.XPATH, '//*[contains(@class, "styles__PostContent-sc-1r7c0ap-3 kylDhV")]')))
    #posts = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*[contains(@class, "styles__PostContent-sc-1r7c0ap-3 kylDhV")]')))
    posts = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*[contains(@id, "post9")]/div[1]/div/span')))
    driver.quit()
    return posts.text

SR_posts = get_posts(url = url)
SR_posts

I have tried many variations of the above but I noticed that all of the reply posts contain ‘post9…’ in their id so I have attempted to use contains(@id, "post9") but I am consistently getting errors, empty lists returned or only the first post in that thread returned, any help with this would be greatly appreciated

Asked By: Kusanagi

||

Source

Answer 1

Instead of specific content //div[@class='styles__PostContent-sc-1r7c0ap-3 kylDhV']/span you need to find all the posts on that page.
All the additional posts are not loaded automatically. To do so you need to scroll the page down. Then to get all the presented posts and print their content.
The following code works:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:webdriverschromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 5)

url = "https://www.thestudentroom.co.uk/showthread.php?t=7263973"
driver.get(url)

wait.until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class,'styles__PostContent')]/span")))
last_reply = wait.until(EC.presence_of_element_located((By.XPATH, "(//button[contains(.,'reply')])[last()]")))
last_reply.location_once_scrolled_into_view
posts = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'styles__PostContent')]/span")))
for post in posts:
    print(post.text)

The result is:

Welcome to GYG 2022-23

Click here to make a blog



What is GYG?

Grow your Grades is a blogging competition that has run on TSR for 8 years now. Each year we ask students from all levels to blog about their journey from wherever they start at the start of the academic year to the end, sharing the highs, the lows and everything in-between. At the end of the year, we read them through and pick an overall winner among other prizes that we give out throughout the year, it pays to Grow Your Grades!

So, why should you enter?

Keeping a blog helps keep yourself motivated and accountable for your studying, plus the community spirit in GYG is always so encouraging and motivating for people! We know it can be difficult to get back into studying, but you can use GYG to challenge yourself and get back into the rhythm of studying

What else do I need to know?

We'll be offering more regular rep prizes, for everything from a really great blog post to being supportive in the forum on top of the usual spot prizes. Finally, we know how valuable your advice is for other students, so we will be offering prizes for your advice throughout the year. Not only will you see your advice help other students in our articles, there will also be prizes to be won. If you have any other ideas for GYG then please post them, we love to hear them!

I want to start a blog, but I don't know how to make it look pretty.

Firstly, to be in a chance of winning, then your blog doesn't have to be pretty! But if you want it to be so, then look at our guide here on how to work the new TSR editor, this will help you edit your posts to perfection

Do my blog posts have to be a certain length?

No This year in particular, we're aware students may have less time than normal, so we're encouraging you to keep your blogs short and sweet if that means you have more time for the slightly more important stuff like homework and school Or even chilling. You may like to consider writing a few sentences each day on the way home from school rather than a bigger post at the weekend, or adding a picture to show your to do list rather than typing it all out again. It's about how you use the words, not the number you use!

What should I include in my blog?

Whatever you like! Your subjects is always a good start, and what subject level, but after that you can include anything from your timetable, homework struggles, pictures of your revision (or your snacks). Anything relevant to growing your grades. Remember you don't have to have super long blog posts, and a picture can tell a thousand words (and save you time too!)

What kinds of prizes are there?

The main prizes are yet to be confirmed for 2022 , but we can confirm we will be having some extra rep prizes for categories such as "Best title", "Best blog from a newbie" and "Best use of photos in a blog". We'll also be running our usual prize for "Best supporter in GYG", and have a special bonus to this of a £10 amazon voucher for the person most supportive before October half term As usual, we may run spot prizes, and also may be awarding random prizes throughout the year if we spot someone being super helpful, making great blog posts or for any other reason. We'll announce these on this thread so everyone can see

When does the competition close?

We'll be closing the competition near the end of September this year. If you have any other questions, then do say below, we're always happy to answer

UPD
I saw that other replys are not initially loaded by 2 ways:

I suggested this XPath //div[contains(@class,'styles__PostContent')]/span will match all the contents but in dev tools I saw only 1 match. However after scrolling I saw much more matches.
On dev tools I see lazyload

that means the elements there are loaded only by scrolling

Answered By: Prophet

Selenium Python can only find the first element

Question:

Answers: