Beautiful Soup Scraping
Question:
I am trying to scrape lineups from https://www.rotowire.com/hockey/nhl-lineups.php
I would like a resulting dataframe like the following
Team
Position
Player
Line
CAR
C
Sebastian Aho
Power Play #1
CAR
LW
Stefan Noesen
Power Play #1
….
This is what I have currently, but am unsure how to get the team and line to matchup with the players/positions as well as put into a dataframe
import requests, pandas as pd
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
lineups = soup.find_all('div', {'class':['lineups']})[0]
names = lineups.find_all('a', title=True)
for name in names:
name = name.get('title')
print(name)
positions = lineups.find_all('div', {'class':['lineup__pos']})
for pos in positions:
pos = pos.text
print(pos)
Answers:
Try:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_data = []
for a in soup.select(".lineup__player a"):
name = a["title"]
pos = a.find_previous("div").text
line = a.find_previous(class_="lineup__title").text
lineup = a.find_previous(class_="lineup__list")["class"][-1]
team = a.find_previous(class_=f"lineup__team {lineup}").img["alt"]
all_data.append((team, pos, name, line))
df = pd.DataFrame(all_data, columns=["Team", "Pos", "Player", "Line"])
print(df.to_markdown(index=False))
Prints:
Team
Pos
Player
Line
CAR
C
Sebastian Aho
POWER PLAY #1
CAR
LW
Stefan Noesen
POWER PLAY #1
CAR
RW
Andrei Svechnikov
POWER PLAY #1
CAR
LD
Brent Burns
POWER PLAY #1
CAR
RD
Martin Necas
POWER PLAY #1
I am trying to scrape lineups from https://www.rotowire.com/hockey/nhl-lineups.php
I would like a resulting dataframe like the following
Team | Position | Player | Line |
---|---|---|---|
CAR | C | Sebastian Aho | Power Play #1 |
CAR | LW | Stefan Noesen | Power Play #1 |
….
This is what I have currently, but am unsure how to get the team and line to matchup with the players/positions as well as put into a dataframe
import requests, pandas as pd
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
lineups = soup.find_all('div', {'class':['lineups']})[0]
names = lineups.find_all('a', title=True)
for name in names:
name = name.get('title')
print(name)
positions = lineups.find_all('div', {'class':['lineup__pos']})
for pos in positions:
pos = pos.text
print(pos)
Try:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.rotowire.com/hockey/nhl-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_data = []
for a in soup.select(".lineup__player a"):
name = a["title"]
pos = a.find_previous("div").text
line = a.find_previous(class_="lineup__title").text
lineup = a.find_previous(class_="lineup__list")["class"][-1]
team = a.find_previous(class_=f"lineup__team {lineup}").img["alt"]
all_data.append((team, pos, name, line))
df = pd.DataFrame(all_data, columns=["Team", "Pos", "Player", "Line"])
print(df.to_markdown(index=False))
Prints:
Team | Pos | Player | Line |
---|---|---|---|
CAR | C | Sebastian Aho | POWER PLAY #1 |
CAR | LW | Stefan Noesen | POWER PLAY #1 |
CAR | RW | Andrei Svechnikov | POWER PLAY #1 |
CAR | LD | Brent Burns | POWER PLAY #1 |
CAR | RD | Martin Necas | POWER PLAY #1 |