Merchant id not found – Amazon
Question:
I am unable to find the merchant id on Amazon product pages, am I missing something? Any help would be great!
I always get the same message on terminal: "No Merchant ID found".
Website URL: https://www.amazon.com/dp/B004X4KRW0/ref=olp-opf-redir?aod=1&ie=UTF8&condition=NEW&th=1
Goal: To list all the merchant IDs using python.
What is merchant id?
For every seller on Amazon a merchant id uniquely identifies them, like for example from the above website URL, if I were to locate merchant id of Amazon as a seller, it will be in html identified as ATVPDKIKX0DER for Amazon.com (US):
<div id="fast-track" class="a-section a-spacing-none"> <input type="hidden" id="ftSelectAsin" value="B004X4KRW0"/> <input type="hidden" id="ftSelectMerchant" value="ATVPDKIKX0DER"/>
Thus I am trying to use xpath to be able to print the merchant id (output) for all the sellers.
# Get Seller merchant ID
# Default Merchant ID
merchant_id = ""
# Try to find merchant ID with xpath
try:
merchant_id = offer.xpath(
.//input[@id='ftSelectMerchant' or @id='ddmSelectMerchant']"
)[0].value
except IndexError:
# try to find merchant ID with regex
try:
merchant_script = offer.xpath(".//script")[0].text.strip()
find_merchant_id = re.search(
r"merchantId = "(w+?)";", merchant_script
)
if find_merchant_id:
merchant_id = find_merchant_id.group(1)
except IndexError:
pass
log.info(f"merchant_id: {merchant_id}")
# log failure to find merchant ID
if not merchant_id:
log.debug("No Merchant ID found")```
Answers:
It seems your are scraping hidden parameters. There may be a lot of ways to do this. I’d show what I do in two ways.
Here is using selenium. element.get_attribute("innerHTML")
gives html string. Just extract the value with regex.
import re
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
url = "https://www.amazon.com/dp/B004X4KRW0"
# set headless
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(url)
element = driver.find_element_by_xpath("//div[@class='a-section']")
innerhtml = element.get_attribute("innerHTML")
# find and get value
a = re.search('<.*merchantID.*value="(.*)"', innerhtml)
print(a.groups()[0]) # ATVPDKIKX0DER
Another way is using BeautifulSoup and request. This is simpler but sometimes fail (probably the response of servers, not sure…)
import urllib.request
from bs4 import BeautifulSoup
url = 'https://www.amazon.com/dp/B004X4KRW0'
html = urllib.request.urlopen(url).read().decode('utf-8')
soup= BeautifulSoup(html, "lxml")
value = soup.find_all("input", {"id":"merchantID"})[0]['value']
print(value) # ATVPDKIKX0DER
(I’m checking Amazon site using selenium to know price change, and it happened some times that attribute names are changed. So it might be good to check if everything is working correctly time to time.)
I am unable to find the merchant id on Amazon product pages, am I missing something? Any help would be great!
I always get the same message on terminal: "No Merchant ID found".
Website URL: https://www.amazon.com/dp/B004X4KRW0/ref=olp-opf-redir?aod=1&ie=UTF8&condition=NEW&th=1
Goal: To list all the merchant IDs using python.
What is merchant id?
For every seller on Amazon a merchant id uniquely identifies them, like for example from the above website URL, if I were to locate merchant id of Amazon as a seller, it will be in html identified as ATVPDKIKX0DER for Amazon.com (US):
<div id="fast-track" class="a-section a-spacing-none"> <input type="hidden" id="ftSelectAsin" value="B004X4KRW0"/> <input type="hidden" id="ftSelectMerchant" value="ATVPDKIKX0DER"/>
Thus I am trying to use xpath to be able to print the merchant id (output) for all the sellers.
# Get Seller merchant ID
# Default Merchant ID
merchant_id = ""
# Try to find merchant ID with xpath
try:
merchant_id = offer.xpath(
.//input[@id='ftSelectMerchant' or @id='ddmSelectMerchant']"
)[0].value
except IndexError:
# try to find merchant ID with regex
try:
merchant_script = offer.xpath(".//script")[0].text.strip()
find_merchant_id = re.search(
r"merchantId = "(w+?)";", merchant_script
)
if find_merchant_id:
merchant_id = find_merchant_id.group(1)
except IndexError:
pass
log.info(f"merchant_id: {merchant_id}")
# log failure to find merchant ID
if not merchant_id:
log.debug("No Merchant ID found")```
It seems your are scraping hidden parameters. There may be a lot of ways to do this. I’d show what I do in two ways.
Here is using selenium. element.get_attribute("innerHTML")
gives html string. Just extract the value with regex.
import re
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
url = "https://www.amazon.com/dp/B004X4KRW0"
# set headless
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(url)
element = driver.find_element_by_xpath("//div[@class='a-section']")
innerhtml = element.get_attribute("innerHTML")
# find and get value
a = re.search('<.*merchantID.*value="(.*)"', innerhtml)
print(a.groups()[0]) # ATVPDKIKX0DER
Another way is using BeautifulSoup and request. This is simpler but sometimes fail (probably the response of servers, not sure…)
import urllib.request
from bs4 import BeautifulSoup
url = 'https://www.amazon.com/dp/B004X4KRW0'
html = urllib.request.urlopen(url).read().decode('utf-8')
soup= BeautifulSoup(html, "lxml")
value = soup.find_all("input", {"id":"merchantID"})[0]['value']
print(value) # ATVPDKIKX0DER
(I’m checking Amazon site using selenium to know price change, and it happened some times that attribute names are changed. So it might be good to check if everything is working correctly time to time.)