Turn table-element into Pandas DataFrame

Question:

I would like to turn a table into a pandas.DataFrame.

URL = 'https://ladieseuropeantour.com/reports-page/?tourn=1202&tclass=rnk&report=tmscores~season=2015~params=P*4ESC04~#/profile'

The element in question is

from selenium import webdriver
from selenium.webdriver.common.by import By

driver.get(URL)
ranking = driver.find_element(By.XPATH, ".//*[@id='maintablelive']")

I tried the following:

import pandas as pd

pd.read_html(ranking.get_attribute('outerHTML'))[0]

I am also using the dropdown-menu to select multiple rounds. When a different round is selected, driver.current_url doesn’t change so I think it’s not possible to load these new tables with requests or anything.

Please advice!

Asked By: HJA24

||

Answers:

Instead of using selenium, you want to access the URL’s API endpoint.

Finding the API endpoint

You can trace it as follows:

  • Open the URL in Chrome
  • Use Ctrl + Shift + J to open DevTools, navigate to Network, select Fetch/XHR from the sub navbar, and refresh the URL.
  • This will reload the network connections, and when you click on one of the lines that will appear, you can select Response from the second sub navbar to see if they are returning data.
  • Going through them, we can locate the connection that is responsible for returning the data for the table, namely https://ladieseuropeantour.com/api/let/cache/let/2015/2015-1202-scores-P*4ESC04.json (you can leave out the query part ?randomadd=1673086921319 at the end).

DevTools

Now, with this knowledge, we can simply load all the data with requests, investigate the type of information that is contained in the JSON and dump the required sub part in a df.

For example, let’s recreate the table from your URL. This one:

Table URL

Code

import requests
import pandas as pd

url = 'https://ladieseuropeantour.com/api/let/cache/let/2015/2015-1202-scores-P*4ESC04.json'

data = requests.get(url).json()

df = pd.DataFrame(data['scores']['scores_entry'])
cols = ['pos','name','nationality','vspar',
        'score_R1','score_R2','score_R3','score_R4', 'score']

df_table = df.loc[:,cols]
df_table.head()

  pos                    name nationality vspar score_R1 score_R2 score_R3  
0   1          Lydia Ko (NZL)         NZL    -9       70       70       72   
1   2          Amy Yang (KOR)         KOR    -7       73       70       70   
2   3  Ariya Jutanugarn (THA)         THA    -4       69       71       72   
3   4        Jenny Shin (KOR)         KOR    -2       76       71       74   
4   4         Ilhee Lee (KOR)         KOR    -2       68       82       69   

  score_R4 score  
0       71   283  
1       72   285  
2       76   288  
3       69   290  
4       71   290  

If you check print(df.columns), you’ll see that the main df contains all the data that lies behind all of the tables you can select on the URL. So, have a go at that to select whatever you are looking for.

Answered By: ouroboros1
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.