How to click on a button on a webpage and iterate through contents after clicking on button using python selenium

Question:

I am using Python Selenium to web scrape from https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL but I want to scrape the Quarterly data instead of the Annual after clicking on the "Quarterly" button on the top right. This is my code so far:

def readQuarterlyBSData(ticker):
    url = 'https://finance.yahoo.com/quote/AAPL/balance-sheet?p=AAPL'
    options = Options()
    options.add_argument('--headless')
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(url)
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click()
    soup = BeautifulSoup(driver.page_source, 'lxml')
    ls= [] 
    # Trying to iterate through each div after clicking on the Quarterly button but content is still Annual Data
    for element in soup.find_all('div'): 
       ls.append(element.string) # add each element one by one to the list

I am able to get the button to click but when I iterate through the divs, I am still getting content that is from Annual data and not Quarterly data. Can someone show me how I can iterate through Quarterly data?

Asked By: AJ Goudel

||

Answers:

soup = BeautifulSoup(driver.page_source, 'lxml')

You don’t need to pass your driver.page_source to BS4, use Selenium itself to extract the data using driver.find_element function.

Here is the doc on that: https://selenium-python.readthedocs.io/locating-elements.html

Also, you are not waiting for the page source to be updated, so add a time delay after the click. You are just waiting for the button to appear, what happens after that? You immediately pass the page source that hasn’t been updated after the click. So wait,

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="Col1-1-Financials-Proxy"]/section/div[1]/div[2]/button'))).click()
time.sleep(10) # wait and see
soup = BeautifulSoup(driver.page_source, 'lxml')

Hope it helps 🙂

Answered By: Reincoder

No, no, no. Do it this way!

import pandas_datareader as web
import pandas as pd

# show balance sheet
aapl.balance_sheet
aapl.quarterly_balance_sheet

This is what you get.

                                                2022-09-24    2022-06-25  
Total Liab                                    3.020830e+11  2.782020e+11   
Total Stockholder Equity                      5.067200e+10  5.810700e+10   
Other Current Liab                            6.709400e+10  5.653900e+10   
Total Assets                                  3.527550e+11  3.363090e+11   
Common Stock                                  6.484900e+10  6.211500e+10   
Other Current Assets                          2.122300e+10  1.638600e+10   
Retained Earnings                            -3.068000e+09  5.289000e+09   
Other Liab                                    3.839400e+10  5.362900e+10   
Gains Losses Not Affecting Retained Earnings -1.110900e+10 -9.297000e+09   
Other Assets                                  4.401100e+10  5.260500e+10   
Cash                                          2.364600e+10  2.750200e+10   
Total Current Liabilities                     1.539820e+11  1.298730e+11   
Short Long Term Debt                          1.112800e+10  1.400900e+10   
Other Stockholder Equity                     -1.110900e+10 -9.297000e+09   
Property Plant Equipment                      5.253400e+10  4.033500e+10   
Total Current Assets                          1.354050e+11  1.122920e+11   
Long Term Investments                         1.208050e+11  1.310770e+11   
Net Tangible Assets                           5.067200e+10  5.810700e+10   
Short Term Investments                        2.465800e+10  2.072900e+10   
Net Receivables                               6.093200e+10  4.224200e+10   
Long Term Debt                                9.895900e+10  9.470000e+10   
Inventory                                     4.946000e+09  5.433000e+09   
Accounts Payable                              6.411500e+10  4.834300e+10   

                                                2022-03-26    2021-12-25  
Total Liab                                    2.832630e+11  3.092590e+11  
Total Stockholder Equity                      6.739900e+10  7.193200e+10  
Other Current Liab                            5.816800e+10  5.704300e+10  
Total Assets                                  3.506620e+11  3.811910e+11  
Common Stock                                  6.118100e+10  5.842400e+10  
Other Current Assets                          1.580900e+10  1.811200e+10  
Retained Earnings                             1.271200e+10  1.443500e+10  
Other Liab                                    5.243200e+10  5.505600e+10  
Gains Losses Not Affecting Retained Earnings -6.494000e+09 -9.270000e+08  
Other Assets                                  5.195900e+10  5.010900e+10  
Cash                                          2.809800e+10  3.711900e+10  
Total Current Liabilities                     1.275080e+11  1.475740e+11  
Short Long Term Debt                          9.659000e+09  1.116900e+10  
Other Stockholder Equity                     -6.494000e+09 -9.270000e+08  
Property Plant Equipment                      3.930400e+10  3.924500e+10  
Total Current Assets                          1.181800e+11  1.531540e+11  
Long Term Investments                         1.412190e+11  1.386830e+11  
Net Tangible Assets                           6.739900e+10  7.193200e+10  
Short Term Investments                        2.341300e+10  2.679400e+10  
Net Receivables                               4.540000e+10  6.525300e+10  
Long Term Debt                                1.033230e+11  1.066290e+11  
Inventory                                     5.460000e+09  5.876000e+09  
Accounts Payable                              5.268200e+10  7.436200e+10  

There’s all kinds of financial info/data available form the ‘pandas_datareader’ library.

https://pandas-datareader.readthedocs.io/en/latest/readers/yahoo.html

Answered By: ASH