scrape a sub attribute? with bs4 in python

Question:

I’m trying to scrape the id’s on a website, but I can’t figure out how to specify the entry I want to work with. this is the most I could narrow it down to a specific class, but I’m not sure how to target the number by ‘id’ under subclass ‘data-preview.’ here’s what I’ve narrow the variable soup down to:

<li class="Li FnPreviewItem" data-preview='{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png",  }'>
<div class="Li Inner FnImage">
<span class="Image" style="background-image:url(www.website.com/image.png);"></span>
</div>
<div class="ImgPreview FnPreviewImage MdNonDisp">
<span class="Image FnPreview" style="background-image:url(www.website.com/image.png);">
</span></div>
</li>

here is the relevant snippet of what I have so far:

from pathlib import Path
from bs4 import BeautifulSoup
import requests
import re

url = "www.website.com/image.png"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

elsoupo = soup.find(attrs={"class": "a fancy title for this class"})
print(elsoupo)

just started working with python, so hopefully I’m wording this so it makes some sense.

Tried to narrow it down with a second attribute that could have any number but I just None back.

elsoupoNum = elsoupo.find(attrs={"id":"^[-+]?[0-9]+$"})

print(elsoupoNum)
Asked By: a.perez

||

Answers:

data-preview is an attribute for li element with a (ill-formed) json string as its value. I corrected it for simplicity, you may want to check this.

code

from bs4 import BeautifulSoup
import json

str = '''
<li class="Li FnPreviewItem" data-preview='{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png"  }'>
<div class="Li Inner FnImage">
<span class="Image" style="background-image:url(www.website.com/image.png);"></span>
</div>
<div class="ImgPreview FnPreviewImage MdNonDisp">
<span class="Image FnPreview" style="background-image:url(www.website.com/image.png);">
</span></div>
</li>
'''

soup = BeautifulSoup(str, 'html.parser')
li = soup.select_one('li[data-preview]')
data = li.attrs['data-preview']
print(data)
j=json.loads(data)
print(j['id'])

output

{ "type" : "animation", "id" : "288857982", "staticUrl" : "www.website.com/image.png"  }
288857982
Answered By: lex