Why am I getting Attribute error: nonetype object has no attribute get_text whenever I try to scrape this ecommerce store

Question

I’m trying to scrape an ecommerce store but getting Attribute error: nonetype object has no attribute get_text. This happens whenever i try to iterate between each products through the product link. I’m confused if am running into a javascript or captcha or whatnot don’t know. Here’s my code

import requests
from bs4 import BeautifulSoup

baseurl = 'https://www.jumia.com'

headers = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}

productlinks = []

for x in range(1,51):
    r = requests.get(f'https://www.jumia.com.ng/ios-phones/?page={x}#catalog-listing/')
    soup = BeautifulSoup(r.content, 'lxml')

    productlist = soup.find_all('article', class_='prd _fb col c-prd')

    for product in productlist:
        for link in product.find_all('a', href=True):
            productlinks.append(baseurl + link['href'])
           
for link in productlinks:
    r = requests.get(link, headers = headers)
    soup = BeautifulSoup(r.content, 'lxml')
    
    name = soup.find('h1', class_='-fs20 -pts -pbxs').get_text(strip=True)
    amount = soup.find('span', class_='-b -ltr -tal -fs24').get_text(strip=True)
    review = soup.find('div', class_='stars _s _al').get_text(strip=True)
    rating = soup.find('a', class_='-plxs _more').get_text(strip=True)
    features = soup.find_all('li', attrs={'style': 'box-sizing: border-box; padding: 0px; margin: 0px;'})
    a = features[0].get_text(strip=True)
    b = features[1].get_text(strip=True)
    c = features[2].get_text(strip=True)
    d = features[3].get_text(strip=True)
    e = features[4].get_text(strip=True)
    f = features[5].get_text(strip=True)

    
    print(f"Name: {name}")
    print(f"Amount: {amount}")
    print(f"Review: {review}")
    print(f"Rating: {rating}")

    print('Key Features')
    print(f"a: {a}")
    print(f"b: {b}")
    print(f"c: {c}")
    print(f"d: {d}")
    print(f"e: {e}")
    print(f"f: {f}")
               
    print('')

Here’s the error message:

Traceback (most recent call last):
  File "c:UsersLPDocumentsjumiajumia.py", line 32, in <module>       
    name = soup.find('h1', class_='-fs20 -pts -pbxs').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'
PS C:UsersLPDocumentsjumia>  here

Asked By: Miracle

||

Source

Answer 1

Change the variable baseurl to https://www.jumia.com.ng and change the features variable to features = soup.find('article', class_='col8 -pvs').find_all('li'). After fixing those two issues, you’ll probably get an IndexError because not every page has six features listed. You can use something like the following code to iterate through the features and print them:

for i, feature in enumerate(features):
        print(chr(ord("a")+i) + ":", feature.get_text(strip=True))

With this for loop, you don’t need the a to f variables. The chr(ord("a")+i part gets the letter corresponding to index i. However, if there are more than 26 features this will print punctuation characters or garbage. This can be trivially fixed by breaking the loop when i>25. This trick won’t work on EBCDIC systems, only ASCII ones.

Even after making these three changes, there was an AttributeError when it tried to scrape a link to a product unrelated to iPhones, which showed up on page 5 of the results. I don’t know how the script got that link; it was a medicinal cream. To fix that, either wrap the body of the second for loop in a try except like the following or put the last line of the first for loop under a if 'iphone' in link.

for link in productlinks:
try:
    # body of for loop goes here
except AttributeError:
    continue

With these changes, the script would look like this:

import requests
from bs4 import BeautifulSoup

baseurl = 'https://www.jumia.com.ng'

headers = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}

productlinks = []

for x in range(1,51):
    r = requests.get(f'https://www.jumia.com.ng/ios-phones/?page={x}#catalog-listing/')
    soup = BeautifulSoup(r.content, 'lxml')

    productlist = soup.find_all('article', class_='prd _fb col c-prd')

    for product in productlist:
        for link in product.find_all('a', href=True):
            if 'iphone' in link['href']:
                productlinks.append(baseurl + link['href'])
           
for link in productlinks:
    r = requests.get(link, headers = headers)
    soup = BeautifulSoup(r.content, 'lxml')

    try:
        name = soup.find('h1', class_='-fs20 -pts -pbxs').get_text(strip=True)
        amount = soup.find('span', class_='-b -ltr -tal -fs24').get_text(strip=True)
        review = soup.find('div', class_='stars _s _al').get_text(strip=True)
        rating = soup.find('a', class_='-plxs _more').get_text(strip=True)
        features = soup.find('article', class_='col8 -pvs').find_all('li')
        
        print(f"Name: {name}")
        print(f"Amount: {amount}")
        print(f"Review: {review}")
        print(f"Rating: {rating}")

        print('Key Features')
        for i, feature in enumerate(features):
            if i > 25: # we ran out of letters
                break
            print(chr(ord("a")+i) + ":", feature.get_text(strip=True))
                   
        print('')
    except AttributeError:
        continue

Answered By: Nathan Mills

Why am I getting Attribute error: nonetype object has no attribute get_text whenever I try to scrape this ecommerce store

Question:

Answers: