How to pull table (html) from webpage using requests_session?

Question:

I’m new to Python so appreciate any help I can get. I have a working script that pulls a table from a webpage by XPATH and puts that information into a dataframe. As I’m pulling this information for hundreds of parts it’s time consuming. I’d like to convert the code to use requests_session to save time, but I’m having trouble getting it to work.

Current Code:

wd = webdriver.Firefox()

#Login to website
....
....

#Get Job Sim information for each part number in list
for pn, qty in zip(part_number_series, bld_qty_series):

    # Open Job Sim URL for PEN w/ pn and qty
    job_sim_url = "(url link)"
    wd.get(job_sim_url)

    # Policy: Wait for Job Sim page to fully load
    job_sim_table_element = WebDriverWait(wd, 20).until(
        EC.presence_of_element_located(
            (By.XPATH,
                "/html/body/form/div[2]/div/div[2]",
            )))

    # Pull Job Sim Information into Pandas
    job_sim_table_element = WebDriverWait(wd, 20).until(
        EC.presence_of_element_located((By.XPATH, "//table[2]"))
    )
    job_sim_table_html = job_sim_table_element.get_attribute("outerHTML")
    job_sim_df = pd.read_html(job_sim_table_html, header=0, index_col=None)[0]

    job_sim_df.columns = [
        "Part Number",
        "Description",
        "Item Status",
        "Critical Item",
        "UOM",
        "PEN Leadtime",
        "Qty Per",
        "Ext. Qty",
        "PEN Available Qty",
        "PEN Qty OnHand",
        "PEN Subinventory",
        "PEN Planner Code",
        "PEN Sourced From",
        "PEN Buyer",
        "PO Detail",
    ]

New code:

wd = webdriver.Firefox()

#Login to website
....
....
cookies = wd.get_cookies()
requests_session= HTMLSession()
for cookie in cookies:
    requests_session.cookies.set(cookie['name'], cookie['value'])
# %%
for pn in Part_Number_Series[:]:
    job_sim_url = "(url link)"
    response = requests_session.get(job_sim_url)
    job_sim_table_element = response.html.xpath('//table[2]')

    job_sim_table_html = job_sim_table_element.get_attribute("outerHTML")
    job_sim_df = pd.read_html(job_sim_table_html, header=0, index_col=None)[0]

    job_sim_df.columns = [
        "Part Number",
        "Description",
        "Item Status",
        "Critical Item",
        "UOM",
        "PEN Leadtime",
        "Qty Per",
        "Ext. Qty",
        "PEN Available Qty",
        "PEN Qty OnHand",
        "PEN Subinventory",
        "PEN Planner Code",
        "PEN Sourced From",
        "PEN Buyer",
        "PO Detail",
    ]

Response I believe does pull the table into job_sim_table_element, but as a list so I get an error that "get attribute" is not an option for a list. I’ve looked at some BeautifulSoup options and others, but so far I haven’t gotten anything to work. Thanks in advance for your help!

Asked By: Stacy Malik

||

Answers:

My son helped me get the code to work with requests but wanted to post the answer in case it’s useful to others.

new code:

wd = webdriver.Firefox()

#Login to website
....
....
cookies = wd.get_cookies()
requests_session= HTMLSession()
for cookie in cookies:
requests_session.cookies.set(cookie['name'], cookie['value'])
# %%
for pn in Part_Number_Series[:]:
    job_sim_url = "(url link)"
    response = requests_session.get(job_sim_url)
    job_sim_table_element = response.html.xpath('//table[2]')
    job_sim_table_html = "n".join([elem.html for elem in job_sim_table_element])
    job_sim_df = pd.read_html(job_sim_table_html, header=0, index_col=None)[0]
    
    job_sim_df.columns = [
        "Part Number",
        "Description",
        "Item Status",
        "Critical Item",
        "UOM",
        "PEN Leadtime",
        "Qty Per",
        "Ext. Qty",
        "PEN Available Qty",
        "PEN Qty OnHand",
        "PEN Subinventory",
        "PEN Planner Code",
        "PEN Sourced From",
        "PEN Buyer",
        "PO Detail",
    ]
Answered By: Stacy Malik
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.