Merchant id not found – Amazon

Question:

I am unable to find the merchant id on Amazon product pages, am I missing something? Any help would be great!
I always get the same message on terminal: "No Merchant ID found".
Website URL: https://www.amazon.com/dp/B004X4KRW0/ref=olp-opf-redir?aod=1&ie=UTF8&condition=NEW&th=1
Goal: To list all the merchant IDs using python.
What is merchant id?
For every seller on Amazon a merchant id uniquely identifies them, like for example from the above website URL, if I were to locate merchant id of Amazon as a seller, it will be in html identified as ATVPDKIKX0DER for Amazon.com (US):
<div id="fast-track" class="a-section a-spacing-none"> <input type="hidden" id="ftSelectAsin" value="B004X4KRW0"/> <input type="hidden" id="ftSelectMerchant" value="ATVPDKIKX0DER"/>
Thus I am trying to use xpath to be able to print the merchant id (output) for all the sellers.

# Get Seller merchant ID
# Default Merchant ID
merchant_id = ""
# Try to find merchant ID with xpath
try:
    merchant_id = offer.xpath(
         .//input[@id='ftSelectMerchant' or @id='ddmSelectMerchant']"
    )[0].value
except IndexError:
    # try to find merchant ID with regex
    try:
        merchant_script = offer.xpath(".//script")[0].text.strip()
        find_merchant_id = re.search(
            r"merchantId = "(w+?)";", merchant_script
        )
        if find_merchant_id:
            merchant_id = find_merchant_id.group(1)
    except IndexError:
        pass
log.info(f"merchant_id: {merchant_id}")
# log failure to find merchant ID
if not merchant_id:
    log.debug("No Merchant ID found")```
Asked By: rf_dante

||

Answers:

It seems your are scraping hidden parameters. There may be a lot of ways to do this. I’d show what I do in two ways.

Here is using selenium. element.get_attribute("innerHTML") gives html string. Just extract the value with regex.

import re

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

url = "https://www.amazon.com/dp/B004X4KRW0"

# set headless
options = Options()
options.headless = True

driver = webdriver.Firefox(options=options)
driver.get(url)

element = driver.find_element_by_xpath("//div[@class='a-section']")

innerhtml = element.get_attribute("innerHTML")

# find and get value
a = re.search('<.*merchantID.*value="(.*)"', innerhtml)

print(a.groups()[0])  # ATVPDKIKX0DER

Another way is using BeautifulSoup and request. This is simpler but sometimes fail (probably the response of servers, not sure…)

import urllib.request
from bs4 import BeautifulSoup

url = 'https://www.amazon.com/dp/B004X4KRW0'

html = urllib.request.urlopen(url).read().decode('utf-8')

soup= BeautifulSoup(html, "lxml")

value = soup.find_all("input", {"id":"merchantID"})[0]['value']

print(value)  # ATVPDKIKX0DER

(I’m checking Amazon site using selenium to know price change, and it happened some times that attribute names are changed. So it might be good to check if everything is working correctly time to time.)

Answered By: shimo
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.