Webscraping Python Website Using JSON Application

Question:

I am trying to get the price of one item on the website in the url below. However, I am finding some issues when looking at the source page of the website.

The url is: https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love

The part of the source page I am interested in is the following (I guess):

<script type="application/ld+json">
    [{

"@context":"http://schema.org",
"@type":"Product",
"productID":"25372685655708131",
"name":"LOVE bracelet, small model",
"description":"#LOVE# bracelet, small model, yellow gold 750/1000. Supplied with a screwdriver. Width: 3.65 mm (for size 17). Now available in a slimmer version, Cartier continues to write the story of the #LOVE# bracelet. Same design, same oval shape, same story: a timeless – yet slightly slimmer – creation which is fastened using a screwdriver. The closure is designed with a functional screw on one side of the bracelet and a hinge on the other. To determine the size of your #LOVE# bracelet, measure your wrist, adding one centimetre to your size for a tighter fit, or two centimetres for a looser fit.",
"image":["https://www.cartier.com/variants/images/25372685655708131/img1/w960.jpg"],
"offers": 
[{"@type":"Offer","availability":"http://schema.org/InStock","priceCurrency":"GBP","price":"4100","sku":"0400574782829","url":"https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html"}]}]
    </script>

I have tried the following steps:

import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd

data = {'url':[],'offers_price':[]}

def get_price(url):
    soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
    data = json.loads(soup.find_all('script', {'type': 'application/ld+json'})[-1].get_text())
    return url, int(data['offers']['price'])

if __name__ == '__main__':

    urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']

    with Pool(processes=4) as pool:
            for url, price in pool.imap_unordered(get_price, urls):
                    data['offers_price'].append(price)
                    data['url'].append(url)
    print(data)

But not successful. How would you approach in this case?

Asked By: Seedizens

||

Answers:

I was able to get the price, but I got it from the product-price tag:

import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd

data = {'url':[],'offers_price':[]}

def get_price(url):
    soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
    data = json.loads(soup.find_all('product-price')[-1]['data-model'])
    return url, int(data['fullPrice'])

if __name__ == '__main__':

    urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']

    with Pool(processes=4) as pool:
            for url, price in pool.imap_unordered(get_price, urls):
                    data['offers_price'].append(price)
                    data['url'].append(url)
    print(data)

Output:

{'url': ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love'], 'offers_price': [4100]}

By the way, are you sure you want to append the url and the price? I think you should do this instead:

data['offers_price'] = price
data['url'] = url
Answered By: Joan Lara