Did something break with beautifulsoup element extraction?

Question:

Classic case of code used to work, changed nothing, now it doesn't work no more here. I’m trying to extract a list of unique appid values from this page that I’m saving locally as roguelike.html

The code I have looks like this and it used to work as of a couple months ago when I last ran it, but now the end result is a list of 1 with just a NoneType in it. Any ideas as to what’s going wrong here?

from bs4 import BeautifulSoup


text_file = open("roguelike.html", "rb")
steamdb_text = text_file.read()
text_file.close()

soup = BeautifulSoup(steamdb_text, "html.parser")

trs = [tr for tr in soup.find_all('tr')]

apps = []

for app in soup.find_all('tr'):
    apps.append(app.get('data-appid'))
appset = list(set(apps))

Is there a simpler way to get the unique appids from the page source? The individual elements I’m trying to cycle over and grab look like:

<tr class="app" data-appid="98821" data-cache="1533726913">

where I want all the unique data-appid values. I’m scratching my head trying to figure out if formatting in the page changed (doesn’t seem like it), or some kind of version upgrade in Spyder, Python, or Beautifulsoup broke something that used to be working.

Any ideas?

Asked By: AI52487963

||

Answers:

I tried this code and it worked well for me. You should make sure that the html file you have is the right file. Perhaps you’ve hit a capcha test in the html test.

Answered By: toppk
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.