AttributeError in for loop webscraping

Question:

I am still learning python. I am confused because I went through the process of pulling each of these tags for just the first result and everything worked beautifully, but when I put it into a loop it throws the error.

For the sake of my learning correct me if I’m wrong, I think this error is telling me that ‘result’ is a nonetype object and that’s why I can’t use a method on it, but I thought I understood that for result in results: is all I need to do to define that as the iteration variable.

URL = 'https://www.zillow.com/eugene-or/rentals/'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36 Edg/104.0.1293.47", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
page = requests.get(URL, headers=headers)
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")
results = soup2.find_all('li', attrs={'class':'ListItem-c11n-8-69-2__sc-10e22w8-0 srp__hpnp3q-0 enEXBq with_constellation'})

records = []
for result in results:
    estate = result.find('address').text[16:-15].split('|')
    details = result.find('span').text[16:-15].split('+')
    link = 'https://www.zillow.com' + result.find('a')['href']
    records.append((estate,details,link))

Here is the error I am getting on the for loop.

AttributeError                            Traceback (most recent call last)
Input In [80], in <cell line: 4>()
      3 records = []
      4 for result in results:
----> 5     estate = result.find('address').text[16:-15].split('|')
      6     details = result.find('span').text[16:-15].split('+')
      7     link = 'https://www.zillow.com' + result.find('a')['href']

AttributeError: 'NoneType' object has no attribute 'text'

Thank you in advance for any input.

Asked By: Erika Loomis

||

Answers:

There are different approaches to fix that – One could be to select your elements more specific:

soup2.select('article:has(address)')

or simply check if address is available:

estate = result.find('address').text[16:-15].split('|') if result.find('address') else None

Also instead of slicing try to strip and split with exact pattern:

estate = result.find('address').get_text(strip=True).split(' | ')
Answered By: HedgeHog