Scraping h5 header text in between div tags

Question

I am trying to attempt webscraping product prices from this website. How would I go around getting a text value inside a h4 heading in between div classes?

HTML:

<div class="product-item">
<a href="/product-catalogue?pid=6963">
<div class="list-item-image">
<img src="https://app.digitalconcept.mn/upload/media/product/0001/05/thumb_4760_product_thumb.png" alt="Кофе Bestcup rich creamy 3NI1 1ш">
</div>
<h5>Кофе Bestcup rich creamy 3NI1 1ш</h5>
<div class="price">500₮</div>
</a>
</div>

My currently used code:

# function to parse
def parse(self, response, **kwargs):
    data = response.xpath(".//div[contains(@class,'product-item')]")
    for item in data:
        yield {
            "name": data.xpath(".//*[@class='h5']/text()").get(),
            "price": data.xpath(".//div[contains(@class,'price')]/text()").get()
        }

My current output:
{‘name’: None, ‘price’: ‘3,700₮’}

My expected output:
{‘name’: ‘Үхрийн махтай кимбаб’, ‘price’: ‘3,700₮’}

Any and all help is appreciated. Thank you.

Asked By: Sod

||

Source

Answer 1

I am having difficulty finding the element as it’s not in English. Also, your expected output and the provided HTML are also different. So I am adding the answer according to the provided HTML

You should your code from

def parse(self, response, **kwargs):
    data = response.xpath(".//div[contains(@class,'product-item')]")
    for item in data:
        yield {
            "name": data.xpath(".//*[@class='h5']/text()").get(),
            "price": data.xpath(".//div[contains(@class,'price')]/text()").get()
        }

To

def parse(self, response, **kwargs):
    data = response.xpath(".//div[contains(@class,'product-item')]")
    for item in data:
        yield {
            "name": item.xpath(".//h5/text()").get(),
            "price": item.xpath(".//div[contains(@class,'price')]/text()").get()
        }

Answered By: Akzy

Scraping h5 header text in between div tags

Question:

Answers: