Why does saving an text containing html inside of variable causing beautifulsoup4 causing unexpected behavior?

Question:

I am using beautifulsoup to automate posting products on one of the shopping platforms, unfortunately their API is disabled currently, so the only option right now is to use beautifulsoup.

How is program expected to work?

Program is expected to read .csv file (I provide the name of the file) and store the product data within the variables – after it, it goes through the multiple steps (filling out the form) – like inputting the name which it gets from variable, example of it:

ime_artikla = driver.find_element(By.ID, 'kategorija_sug').send_keys(csvName) #Here it inputs the name

where csvName is passed value to the function along with some other parameters:

def postAutomation(csvName, csvPrice, csvproductDescription):

The way that I am reading file is following:

filename = open(naziv_fajla, 'r', encoding="utf-8") #File name to open + utf-8 encoding was necess.
file = csv.DictReader(filename)

The above lines of code are within the try: statement.

The way that I am reading columns from csv file is following:

for col in file:
            print("Reading column with following SKU: " + color.GREEN + color.BOLD + (col['SKU']) + color.END + "n")
            csvSKU = (col['SKU'])
            csvName = (col['Name'])
            #csvCategory = (col['Categories'])
            csvPrice = (col['Price'])
            csvproductDescription = (col['Description'])
            print(csvName)
            #print(csvCategory)
            print(csvPrice)
            print(csvproductDescription)
            postAutomation(csvName, csvPrice, csvproductDescription)
            i+=1
            counterOfProducts = counterOfProducts + 1

This is working as expected (the product is published on online store successfully) all until there’s HTML and/or inline-css for product description

The problem :

As I’ve said the problem is happening when I have column containing html.

As I am populating the field for product description (Tools > Source Code), the site is using tinymce for editing text and putting description etc…

There are actually two scenarios that are happening, where program is acting as not expected:

  1. Case:

In the first case, the product is published successfully, but, the <li> and n is not treated as HTML for some reason, here’s an example of one product’s description (where this problem occurs):

<p style="text-align: center;">Some product description.n<ul>n <li>Product feature 1</li>n         <li>Prod Feature 2</li>n<li>Prod Feature 3</li>n<li>Prod Feature 3</li>n<li>Prod feature 4</li>n</ul>

What I get when I submit this code:

nnProduct feature 1nProd Feature 2nProd Feature 3nProd Feature 3nProd feature 4n

  1. Case:

In the second case what happens, is that program crashes. What happens is following:

Somehow the product description which is taken from csv file confuses (I think its due to complex html) program – the part of the product description gets into the field for price &nbsp..., <— this, which is on totally next page (you have to click next onto the end of the page where product description goes) and then input the price, which seems weird to me.

The werid thing is that I have template for product description (which is HTML and CSS) and I save it into the string literal, as template1 = """" A LOT OF HTML AND INLINE CSS """ and end_of_template = """ A LOT OF HTML AND INLINE CSS """ and it gets rendered perfectly after doing this:

final_description = template1 + csvproductDescription + end_of_template

But the html and inline css inside of csvproductDescription variable doesn’t get treated as HTML and CSS.

How can I fix this?

Asked By: smack857

||

Answers:

Seems like problem was that I have had whitespaces inside of the product description, so I have solved it like this:

final_description = html_and_css
final_description = final_description + csvproductDescription
final_description = final_description + html_and_css2
final_description = " ".join(re.split("s+", final_description, flags=re.UNICODE))
Answered By: smack857