How to requests all sizes in stock – Python

Question:

I’m trying to request all the sizes in stock from Zalando. I can not quite figure out how to do it since the video I’m watching
showing how to request sizes look different than min.
The video that I watch was this. Video – 5.30

Does anyone know how to request the sizes in stock and print the sizes that in stock?

The site in trying to request sizes of: here

My code looks like this:

import requests
from bs4 import BeautifulSoup as bs

session = requests.session()

def get_sizes_in_stock():
    global session
    endpoint = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
    response = session.get(endpoint)

    soup = bs(response.text, "html.parser")

I have tried to go to the View page source and look for the sizes, but I could not see the sizes in the page source.

I hope someone out there can help me what to do.

Asked By: Robert Tacchini

||

Answers:

The sizes are in the page

I found them in the html, in a javascript tag, in the format

{
    "sku": "NI112O0BT-A110090000",
    "size": "42.5",
    "deliveryOptions": [
        {
            "deliveryTenderType": "FASTER"
        }
    ],
    "offer": {
        "price": {
            "promotional": null,
            "original": {
                "amount": 114500
            },
            "previous": null,
            "displayMode": null
        },
        "merchant": {
            "id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
        },
        "selectionContext": null,
        "isMeaningfulOffer": true,
        "displayFlags": [],
        "stock": {
            "quantity": "MANY"
        },
        "sku": "NI112O0BT-A110090000",
        "size": "42.5",
        "deliveryOptions": [
            {
                "deliveryTenderType": "FASTER"
            }
        ],
        "offer": {
            "price": {
                "promotional": null,
                "original": {
                    "amount": 114500
                },
                "previous": null,
                "displayMode": null
            },
            "merchant": {
                "id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
            },
            "selectionContext": null,
            "isMeaningfulOffer": true,
            "displayFlags": [],
            "stock": {
                "quantity": "MANY"
            }
        },
        "allOffers": [
            {
                "price": {
                    "promotional": null,
                    "original": {
                        "amount": 114500
                    },
                    "previous": null,
                    "displayMode": null
                },
                "merchant": {
                    "id": "810d1d00-4312-43e5-bd31-d8373fdd24c7"
                },
                "selectionContext": null,
                "isMeaningfulOffer": true,
                "displayFlags": [],
                "stock": {
                    "quantity": "MANY"
                },
                "deliveryOptions": [
                    {
                        "deliveryWindow": "2022-05-23 - 2022-05-25"
                    }
                ],
                "fulfillment": {
                    "kind": "ZALANDO"
                }
            }
        ]
    }
}

If you parse the html with bs4 you should be able to find the script tag and extract the JSON.

Answered By: Lukas Schmid

The sizes for the default color of shoe are shown in html. Alongside this are the urls for the other colors. You can extract these into a dictionary and loop, making requests and pulling the different colors and their availability, which I think is what you are actually requesting, as follows (note: I have kept quite generic to avoid hardcoding keys which change across requests):

import requests, re, json

def get_color_results(link):
    headers = {"User-Agent": "Mozilla/5.0"}
    r = requests.get(link, headers=headers).text
    data = json.loads(re.search(r'({"enrichedEntity".*size.*)</script', r).group(1))
    results = []
    color = ""
    for i in data["graphqlCache"]:
        if "ern:product" in i:
            if "product" in data["graphqlCache"][i]["data"]:
                if "name" in data["graphqlCache"][i]["data"]["product"]:
                    results.append(data["graphqlCache"][i]["data"]["product"])
                if (
                    color == ""
                    and "color" in data["graphqlCache"][i]["data"]["product"]
                ):
                    color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
    return (color, results)


link = "https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html"
final = {}
color, results = get_color_results(link)
colors = {
    j["node"]["color"]["name"]: j["node"]["uri"]
    for j in [
        a
        for b in [
            i["family"]["products"]["edges"]
            for i in results
            if "family" in i
            if "products" in i["family"]
        ]
        for a in b
    ]
}
final[color] = {
    j["size"]: j["offer"]["stock"]["quantity"]
    for j in [i for i in results if "simples" in i][0]["simples"]
}

for k, v in colors.items():
    if k not in final:
        color, results = get_color_results(v)
        final[color] = {
            j["size"]: j["offer"]["stock"]["quantity"]
            for j in [i for i in results if "simples" in i][0]["simples"]
        }

print(final)

Explanatory notes from chat:

  1. Use chrome browser to navigate to link

  2. Press Ctrl + U to view page source

  3. Press Ctrl + F to search for 38.5 in html

    The first match is the long string you already know about. The string is long and difficult to navigate in page source and identify which tag it is part of. There are a number of ways I could identify the right script from these, but for now, an easy way would be:

from bs4 import BeautifulSoup as bs

link = 'https://www.zalando.dk/nike-sportswear-air-max-90-sneakers-ni112o0bt-a11.html'
headers = {'User-Agent':'Mozilla/5.0'}
r = requests.get(link, headers = headers)
soup = bs(r.text, 'lxml')

for i in soup.select('script[type="application/json"]'):
    if '38.5' in i.text:
        print(i)
        break

Slower method would be:

soup.find("script", text=re.compile(r'.*38.5.*'))
  1. Whilst I used bs4 to get the right script tag contents, this was so I knew the start and end of the string denoting the JavaScript object I wanted to use re to extract, and then to deserialize into a JSON object with json; this in a re-write to use re rather than bs4 i.e. use re on entire response text, from the request, and pass a regex pattern which would pull out the same string

  2. I put the entire page source in a regex tool and wrote a regex to return that same string as identified above. See that regex here

  3. Click on right hand side, match 1 group 1, to see highlighted the same string being returned from regex as you saw with BeautifulSoup. Two different ways of getting the same string containing the sizes

enter image description here

  1. That is the string which I needed to examine, as JSON, the structure of. See in json viewer here

  2. You will notice the JSON is very nested with some keys to dictionaries that are likely dynamic, meaning I needed to write code which could traverse the JSON and use certain more stable keys to pull out the colours available, and for the default shoe colour the sizes and availability

  3. There is an expand all button in that JSON viewer. You can then search with Ctrl + F for 38.5 again

enter image description here

10a) I noticed that size and availability were for the default shoe colour

10b) I also noticed that within JSON, if I searched by one of the other colours from the dropdown, I could find URIs for each colour of show listed

enter image description here

  1. I used Wolf as my search term (as I suspected less matches for that term within the JSON)

enter image description here

You can see one of the alternate colours and its URI listed above

  1. I visited that URI and found the availability and shoe sizes for that colour in same place as I did for the default white shoes

  2. I realised I could make an initial request and get the default colour and sizes with availability. From that same request, extract the other colours and their URIs

  3. I could then make requests to those other URIs and re-use my existing code to extract the sizes/availability for the new colours

  4. This is why I created my get_color_results() function. This was the re-usable code to extract the sizes and availability from each page

  5. results holds all the matches within the JSON to certain keys I am looking for to navigate to the right place to get the sizes and availabilities, as well as the current colour

  6. This code traverses the JSON to get to the right place to extract data I want to use later

results = []
color = ""
for i in data["graphqlCache"]:
    if "ern:product" in i:
        if "product" in data["graphqlCache"][i]["data"]:
            if "name" in data["graphqlCache"][i]["data"]["product"]:
                results.append(data["graphqlCache"][i]["data"]["product"])
            if (
                color == ""
                and "color" in data["graphqlCache"][i]["data"]["product"]
            ):
                color = data["graphqlCache"][i]["data"]["product"]["color"]["name"]
  1. The following pulls out the sizes and availability from results:
{
    j["size"]: j["offer"]["stock"]["quantity"]
    for j in [i for i in results if "simples" in i][0]["simples"]
}
  1. For the first request only, the following gets the other shoes colours and their URIs into a dictionary to later loop:
colors = {
    j["node"]["color"]["name"]: j["node"]["uri"]
    for j in [
        a
        for b in [
            i["family"]["products"]["edges"]
            for i in results
            if "family" in i
            if "products" in i["family"]
        ]
        for a in b
    ]
}
  1. This bit gets all the other colours and their availability:
for k, v in colors.items():
    if k not in final:
        color, results = get_color_results(v)
        final[color] = {
            j["size"]: j["offer"]["stock"]["quantity"]
            for j in [i for i in results if "simples" in i][0]["simples"]
        }
  1. Throughout, I update the dictionary final with the found colour and associated size and availabilities
Answered By: QHarr

Always check if an hidden api is available, it will save you a looooot of time.

In this case I found this api:

You can pass a payload and you obtain a json answer

# I extracted the payload from the network tab of my browser debbuging tools
payload = """[{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0S7-H11"}},{"id":"0ec65c3a62f6bd0b29a59f22021a44f42e6282b7f8ff930718a1dd5783b336fc","variables":{"id":"ern:product::NI112O0RY-A11"}}]"""

conn = http.client.HTTPSConnection("www.zalando.dk")

headers = {
    'content-type': "application/json"
}

conn.request("POST", "/api/graphql", payload, headers)

res = conn.getresponse()
res = res.read() # json output

res contains for each product a json leaf containing the available size:

"simples": [
        {
            "size": "38.5",
            "sku": "NI112O0P5-A110060000"
        },
        {
            "size": "44.5",
            "sku": "NI112O0P5-A110105000"
        },
        {
            ...

It’s now easy to extract the informations.

There also is a field that indicate if the product got a promotion or not, cool if you want to track a discount.

Answered By: obchardon