How to scrape specific name from link bs4

Question:

I’m trying to use bs4 to scrape this webpage to get the titles of the "Episode" and the rating. I already have the rating down and I’m using the following code

first_url = 'https://www.imdb.com/search/title/?series=tt0206512&view=simple&sort=release_date,asc'

page = requests.get(first_url)

soup = BeautifulSoup(page.content, 'html.parser')

# get a list of descriptions to parse
ratings = soup.find_all("div",{"class": "col-imdb-rating"})


However, when I try to use the tag ‘a’, it’s not quite working. Does anyone have suggestions on how to get each episode name from this website?

So I’m looking for here: "Help Wanted/Reef Blower/Tea at the Treedome"

html for page

Asked By: bmatt23

||

Answers:

When a URL is given as /some/folder/somepage, it’s with respect to the root page (https://www.imdb.com in this case). So get the href value from <a> tag and append it, to get https://www.imdb.com/title/tt0707293/?ref_=adv_li_tt.

Answered By: eccentricOrange

there are many a elements on the website, therefore all episodes can be obtained by using the closest element (in this case small) to retrieve element a which contains the episode. The closest element to small can be represented by a + sign. Try this

episodes = soup.select("div.lister-item small + a[href]")
for episode in episodes:
  print(episode.text)
Answered By: Jordy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.