HTML Div Section with Categories

Question:

Hope this is an easy question, but I’ve struggled to find a solution or an explanation. Like others, I’m am attempting to pull the full Michelin list for an area, not only those ranked with Stars. The listing and details on the restaurant exists in this Div section, but I don’t understand how one would either select or parse out the data-* variables. I’m able to write out a selection to isolate this Div, but lack next step to make it useful

<div class="card__menu-footer d-flex js-match-height-footer">
<div class="card__menu-like box-placeholder js-favorite-restaurant" data-pid="1204329" data-enabled="false"
     data-category="restaurant.result"
     data-cooking-type="Japanese"
     data-country="ee"
     data-guide="Estonia"
     data-language="en"
     data-dtm-chef=""
     data-dtm-city="New York"
     data-dtm-distinction=""
     data-dtm-district="Manhattan"
     data-dtm-id="1204329"
     data-dtm-online-booking="False"
     data-dtm-price="none"
     data-dtm-region="New York State"
     data-restaurant-country="us"
     data-restaurant-name="Joji"
     data-restaurant-selection="USA">
    <img src="/assets/images/icons/love-off-58dca5751a8ad8f50468df25d762b097.svg" class="love-this pl-image" alt=""/>
</div>
</div>
Asked By: scipio1551

||

Answers:

You can use .attrs property to access the tag attributes. Here is an example how you can parse the data-* attributes to a dict:

from bs4 import BeautifulSoup

html_doc = """
<div class="card__menu-footer d-flex js-match-height-footer">
<div class="card__menu-like box-placeholder js-favorite-restaurant" data-pid="1204329" data-enabled="false"
     data-category="restaurant.result"
     data-cooking-type="Japanese"
     data-country="ee"
     data-guide="Estonia"
     data-language="en"
     data-dtm-chef=""
     data-dtm-city="New York"
     data-dtm-distinction=""
     data-dtm-district="Manhattan"
     data-dtm-id="1204329"
     data-dtm-online-booking="False"
     data-dtm-price="none"
     data-dtm-region="New York State"
     data-restaurant-country="us"
     data-restaurant-name="Joji"
     data-restaurant-selection="USA">
    <img src="/assets/images/icons/love-off-58dca5751a8ad8f50468df25d762b097.svg" class="love-this pl-image" alt=""/>
</div>
</div>"""

soup = BeautifulSoup(html_doc, "html.parser")

div = soup.select_one(".js-favorite-restaurant")

out = {}
for attr, value in div.attrs.items():
    if attr.startswith("data-"):
        attr = attr.split("-", maxsplit=1)[-1]
        out[attr] = value

print(out)

Prints:

{
    "pid": "1204329",
    "enabled": "false",
    "category": "restaurant.result",
    "cooking-type": "Japanese",
    "country": "ee",
    "guide": "Estonia",
    "language": "en",
    "dtm-chef": "",
    "dtm-city": "New York",
    "dtm-distinction": "",
    "dtm-district": "Manhattan",
    "dtm-id": "1204329",
    "dtm-online-booking": "False",
    "dtm-price": "none",
    "dtm-region": "New York State",
    "restaurant-country": "us",
    "restaurant-name": "Joji",
    "restaurant-selection": "USA",
}
Answered By: Andrej Kesely