BeautifulSoup output into List

Question:

I am scraping products description from a website. The product has "Old Price" and "New Price". All the products have these both except one (which has only the "New Price"). I append the values to an empty list. So there are four lists with "Product Names", "Product Old Price", "Product New Price" and "Product Reviews". When I try to make a CSV file it gives me an error "arrays must all be the same length". The reason for this error is: "Product Old Price" list has 17 entries and the other three lists have 18 entries. As explained earlier, in one product "Product Old Price" is not given. Below is my code:

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.petplanet.co.uk/d7/dog_food"
r = requests.get(url)
soup = BeautifulSoup(r.content)
prod_name =[]
prod_old_price = []
prod_new_price = []
prod_reviews = []
item = soup.findAll("a", class_ = "thumbLink")
for name in item[0:15]:
    pro_name = name.get("title")
    prod_name.append(pro_name)
price = soup.findAll("span", class_ = "price right")
for prices in price:
    pro_new_price1 = prices.text
    pro_new_price = pro_new_price1.replace("آ"," ")
    prod_new_price.append(pro_new_price)
old_price = soup.findAll("span", class_ = "price-old")
for old_pri in old_price:
    pro_old_price = old_pri.text
    prod_old_price.append(pro_old_price)

reviews = soup.findAll("span", class_ = "text-prod-review-score")
for rev in reviews:
    pro_reviews = (len(rev))
    prod_reviews.append(pro_reviews)
old_price = soup.findAll("span", class_ = "price-old")
for old_pri in old_price:
    pro_old_price = old_pri.text
    prod_old_price.append(pro_old_price)

pet_products = pd.DataFrame({"Product Name": prod_name, "Product Old Price": prod_old_price, "Product New Price": prod_new_price, "Product Reviews     as # of Star": prod_reviews})
pet_products.to_csv("Pets Products.csv")

I want "N/A" or "None" where there is no "Product Old Price" given.
or is there any other way.
Thanks

Asked By: Muhammad Rehan

||

Answers:

Recommendation

Loop the products in an other way and create a list of dicts it is easier to handle I think, also use find_all() instead of die old version findAll()

What happens?

Cause old_price is not in the page_source if there is no sale_price you wont find the right position to set a value of NA the way your searching for.

Take a look at my example – If there is no old_price it would raise an error but you can use this to create the NA values:

try:
    old_price = product.find("span", class_ = "price-old").get_text(strip=True)
except:
    old_price = 'NA'

Example

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.petplanet.co.uk/d7/dog_food"

r = requests.get(url)
soup = BeautifulSoup(r.content)

p_data = []

for product in soup.select('div#box-scroll-content li'):
    new_price = product.find("span", class_ = "price right").get_text().replace("آ"," ")
    try:
        old_price = product.find("span", class_ = "price-old").get_text(strip=True)
    except:
        old_price = 'NA'
    
    p_data.append({
        'new_price': new_price,
        'old_price': old_price
    })
    
pd.DataFrame(p_data)

Output

    new_price   old_price
0   £69.99  £76.99
1   £2.19   None
2   £6.99   £11.49
3   £6.99   £10.99
4   £0.89   £1.00
Answered By: HedgeHog