How to a pull a specific "data-stat" value? (python)
Question:
So the code so far pulls up a page from https://www.basketball-reference.com and grabs any data in tr_body with the data-stat class(???).
I need a way to pull specific values of data stat, for example for https://www.basketball-reference.com/players/l/lowryky01.html if I wanted to find the position, I would want to pull the ‘data-stat=pos’ class.
Here’s what I tried:
soup = BeautifulSoup(source, 'lxml')
tbody = soup.find('tbody')
pergame = tbody.find(class_="full_table")
classrite = pergame.find(class_="right")
tr_body = tbody.find_all('tr')
print(pergame)
# seperates data-stat, apparently you can use .get to get obscure classes
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
for td in trb.find_all('td'):
print(td.get_text())
print(td.get('data-stat'))
Answers:
Well, from what I can tell, you’ve basically already done what you wanted.
From this point, just organize the information you’ve pulled into a dictionary and then you can extract the values by their key.
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
row = {}
for td in trb.find_all('td'):
row[td.get('data-stat')] = td.get_text()
print(row['pos'], row['team_id'], row['fg_pct'])
Hope this helps.
So the code so far pulls up a page from https://www.basketball-reference.com and grabs any data in tr_body with the data-stat class(???).
I need a way to pull specific values of data stat, for example for https://www.basketball-reference.com/players/l/lowryky01.html if I wanted to find the position, I would want to pull the ‘data-stat=pos’ class.
Here’s what I tried:
soup = BeautifulSoup(source, 'lxml')
tbody = soup.find('tbody')
pergame = tbody.find(class_="full_table")
classrite = pergame.find(class_="right")
tr_body = tbody.find_all('tr')
print(pergame)
# seperates data-stat, apparently you can use .get to get obscure classes
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
for td in trb.find_all('td'):
print(td.get_text())
print(td.get('data-stat'))
Well, from what I can tell, you’ve basically already done what you wanted.
From this point, just organize the information you’ve pulled into a dictionary and then you can extract the values by their key.
for trb in tr_body:
print(trb.get('id'))
th = trb.find('th')
print(th.get_text())
print(th.get('data-stat'))
row = {}
for td in trb.find_all('td'):
row[td.get('data-stat')] = td.get_text()
print(row['pos'], row['team_id'], row['fg_pct'])
Hope this helps.