Web scraping table with missing attributes via Python Selenium and Pandas

Question:

Scraping a table from a website. But encountering empty cells during the process. Below try-except block is screwing up the data at the end. Also dont want to exclude the complete row, as the information is still relevant even when the some attribute is missing.

try:
    for i in range(10):
        data = {'ID': IDs[i].get_attribute('textContent'),
                'holder': holder[i].get_attribute('textContent'),
                'view': view[i].get_attribute('textContent'),
                'material': material[i].get_attribute('textContent'),
                'Addons': addOns[i].get_attribute('textContent'),
                'link': link[i].get_attribute('href')}
        list.append(data)
except:
    print('Error')

Any ideas?

Asked By: Lilly

||

Answers:

What you can do is place all the objects to which you want to access the attributes to in a dictionary like this:

objects={"IDs":IDs,"holder":holder,"view":view,"material":material...]

Then you can iterate through this dictionary and if the specific attribute does not exist, simply append an empty string to the value corresponding to the dict key. Something like this:

the_keys=list(objects.keys())
for i in range(len(objects["IDs"])): #I assume the ID field will never be empty
   #so making a for loop like this is better since you iterate only through 
   #existing objects
   data={}
   
   for j in range(len(objects)):
      try:
         data[the_keys[j]]=objects[the_keys[j]][i].get_attribute('textContent')
      except Exception as e:
         print("Exception: {}".format(e))
         data[the_keys[j]]="" #this means we had an exception
         #it is better to catch the specific exception that is thrown
         #when the attribute of the element does not exist but I don't know what it is
   list.append(data)

I don’t know if this code works since I didn’t try it but it should give you an overall idea on how to solve your problem.

If you have any questions, doubts, or concerns please ask away.

Edit: To get another object’s attribute like the href you can simply include an if statement checking the value of the key. I also realized you can just loop through the objects dictionary getting the keys and values instead of accessing each key and value by an index. You could change the inner loop to be like this:

for key,value in objects.items():
   try:
      if key=="link":
         data[key]=objects[key][i].get_attribute("href")
      else:
         data[key]=objects[key][i].get_attribute("textContent")
   except Exception as e:
      print("Error: ",e)
      data[key]=""

Edit 2:

data={}
for i in list(objects.keys()):
   data[i]=[]
for key,value in objects.items():
   for i in range(len(objects["IDs"])):
      try:
         if key=="link":
            data[key].append(objects[key][i].get_attribute("href"))
         else:
            data[key].append(objects[key][i].get_attribute("textContent"))
      except Exception as e:
         print("Error: ",e)
         data[key].append("")

Try with this. You won’t have to append the data dictionary to the list. Without the original data I won’t be able to help much more. I believe this should work.

Answered By: Scoobylolo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.