BeautifulSoup "AttributeError: 'NoneType' object has no attribute 'text'"

Question:

I was web-scraping weather-searched Google with bs4, and Python can’t find a <span> tag when there is one. How can I solve this problem?

I tried to find this <span> with the class and the id, but both failed.

<div id="wob_dcp">
    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    
</div>

Above is the HTML code I was trying to scrape in the page:

response = requests.get('https://www.google.com/search?hl=ja&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text

But failed with this code, the error is:

Traceback (most recent call last):
  File "C:Userssungn_000Desktopweather.py", line 23, in <module>
    tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
AttributeError: 'NoneType' object has no attribute 'text'

Please solve this error.

Asked By: sjk1204

||

Answers:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(a)
>>> a
'<div id="wob_dcp">n    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    n</div>'
>>> soup.find("span", id="wob_dc").text
'Clear with periodic clouds'

Try this out.

Answered By: Pravin

This is because the weather section is rendered by the browser via JavaScript. So when you use requests you only get the HTML content of the page which doesn’t have what you need.
You should use for example selenium (or requests-html) if you want to parse page with elements rendered by web browser.

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.google.com/search?hl=en&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
print(tomorrow_weather)

Output:

pawel@pawel-XPS-15-9570:~$ python test.py
Clear with periodic clouds
Answered By: pawelbylina

I also had this problem.
You should not import like this

from bs4 import BeautifulSoup

you should import like this

from bs4 import * 

This should work.

Answered By: Arson Basak

It’s not rendered via JavaScript as pawelbylina mentioned, and you don’t have to use requests-html or selenium since everything needed is in the HTML, and it will slow down the scraping process a lot because of page rendering.

It could be because there’s no user-agent specified thus Google blocks your request and you receiving a different HTML with some sort of error because the default requests user-agent is python-requests. Google understands it and blocks a request since it’s not the "real" user visit. Checks what’s your user-agent.

Pass user-agent intro request headers:

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

requests.get("YOUR_URL", headers=headers)

You’re looking for this, use select_one() to grab just one element:

soup.select_one('#wob_dc').text

Have a look at SelectorGadget Chrome extension to grab CSS selectors by clicking on the desired elements in your browser.


Code and full example that scrapes more in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "일본 桜川市真壁町古城 내일 날씨",
  "hl": "en",
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

location = soup.select_one('#wob_loc').text
weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text

print(f'Location: {location}n'
      f'Weather condition: {weather_condition}n'
      f'Temperature: {tempature}°Fn'
      f'Precipitation: {precipitation}n'
      f'Humidity: {humidity}n'
      f'Wind speed: {wind}n'
      f'Current time: {current_time}n')

------
'''
Location: Makabecho Furushiro, Sakuragawa, Ibaraki, Japan
Weather condition: Cloudy
Temperature: 79°F
Precipitation: 40%
Humidity: 81%
Wind speed: 7 mph
Current time: Saturday
'''

Alternatively, you can achieve the same thing by using the Direct Answer Box API from SerpApi. It’s a paid API with a free plan.

The difference in your case is that you don’t have to think about how to bypass block from Google or figure out why data from certain elements aren’t extracting as it should since it’s already done for the end-user. The only thing that needs to be done is to iterate over structured JSON and grab the data you want.

Code to integrate:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "일본 桜川市真壁町古城 내일 날씨",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

print(f'{loc}n{weather_date}n{weather}n{temp}°Fn{precipitation}n{humidity}n{wind}n')

--------
'''
Makabecho Furushiro, Sakuragawa, Ibaraki, Japan
Saturday
Cloudy
79°F
40%
81%
7 mph
'''

Disclaimer, I work for SerpApi.

Answered By: Dmitriy Zub