Get text with Xpath

Question:

I want to collect GitHub users’ monthly contributions from 2004 until now as shown in the picture. enter image description here And output the monthly data into csv file with corresponding month columns (e.g., 2022_10). The Xpath of these texts is:

#//*[@id="js-contribution-activity"]/div/div/div/div/details/summary/span[1]

This is what my csv file (df1) looks like:

Here is my best try so far:

for index, row in df1.iterrows():
    try:
        user = row['user']
    except:
        pass
    for y in range(2004, 2023):
        for m in range(1, 13):
            try:
                current_url = f'https://github.com/{user}?tab=overview&from={y}-{m}-01&to={y}-{m}-31'
                print(current_url)
                driver.get(current_url)
                time.sleep(0.1)
                contribution = driver.findElement(webdriver.By.xpath("//*[@id='js-contribution-activity']/div/div/div/div/details/summary/span[1]")).getText();
                df1.loc[index, f'{str(y)}_{str(m)}'] = contribution
            except:
                pass

print(df1)
df1.to_csv('C:/Users/fredr/Desktop/output today.csv')

I cannot figure out why there is no output. Thanks for your help.

Asked By: KDWB

||

Answers:

I haven’t tried it with selenium but with just requests and lxml this xpath expression

//div[@class="contribution-activity-listing float-left col-12 "]//details[@class="Details-element details-reset"]/summary/span[1]

seems to work.

Answered By: Jack Fleeting

You need to use WebDriverWait expected_conditions explicit waits.
I see there are multiple contribution fields there, so you need to collect all those elements as a list and then to iterate over the list extracting each element text.
You need to improve your locators, they should be short and clear as possible.
Also you mixed in your code Java and Python. getText() and ; are from Java…
Try this:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

driver.get(current_url)
contributions = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@id='js-contribution-activity']//summary/span[1]")))
for contribution in contributions:
    print(contribution.text)
Answered By: Prophet