BeautifulSoup giving me many error lines when used
Question:
I’ve installed beautifulsoup (file named bs4) into my pythonproject folder which is the same folder as the python file I am running. The .py file contains the following code, and for input I am using this URL to a simple page with 1 link which the code is supposed to retrieve.
URL used as url input: http://data.pr4e.org/page1.htm
.py code:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
Though I could be wrong, it appears to me that bs4 imports correctly because my IDE program suggests BeautifulSoup when I begin typing it. After all, it is installed in the same directory as the .py file. however, It spits out the following lines of error when I run it using the previously provided url:
Traceback (most recent call last):
File "C:UsersThomasPycharmProjectspythonProjectmain.py", line 16, in <module>
soup = BeautifulSoup(html, 'html.parser')
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 215, in __init__
self._feed()
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 241, in _feed
self.endData()
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 315, in endData
self.object_was_parsed(o)
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 320, in
object_was_parsed
previous_element = most_recent_element or self._most_recent_element
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1001, in __getattr__
return self.find(tag)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1238, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1259, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 516, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1560, in __init__
self.text = self._normalize_search_value(text)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1565, in _
normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value,
'match')
AttributeError: module 'collections' has no attribute 'Callable'
Process finished with exit code 1
The lines being referred to in the error messages are from files inside bs4 that were downloaded as part of it. I haven’t edited any of the bs4 contained files or even touched them. Can anyone help me figure out why bs4 isn’t working?
Answers:
Are you using python 3.10? Looks like beautifulsoup library is using removed deprecated aliases to Collections Abstract Base Classes. More info here: https://docs.python.org/3/whatsnew/3.10.html#removed
A quick fix is to paste these 2 lines just below your imports:
import collections
collections.Callable = collections.abc.Callable
Andrey, i cannot comment yet. But i tried your fix and Im using Thonny and using 3.10 in terminal. But after adding the two import collections and callable lines. i get another error in Thonny that isnt shown in terminal. when i run the program in terminal it simply seems to do nothing. In Thonny it suggests that "Module has no attribute "Callable"
I’ve installed beautifulsoup (file named bs4) into my pythonproject folder which is the same folder as the python file I am running. The .py file contains the following code, and for input I am using this URL to a simple page with 1 link which the code is supposed to retrieve.
URL used as url input: http://data.pr4e.org/page1.htm
.py code:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
Though I could be wrong, it appears to me that bs4 imports correctly because my IDE program suggests BeautifulSoup when I begin typing it. After all, it is installed in the same directory as the .py file. however, It spits out the following lines of error when I run it using the previously provided url:
Traceback (most recent call last):
File "C:UsersThomasPycharmProjectspythonProjectmain.py", line 16, in <module>
soup = BeautifulSoup(html, 'html.parser')
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 215, in __init__
self._feed()
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 241, in _feed
self.endData()
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 315, in endData
self.object_was_parsed(o)
File "C:UsersThomasPycharmProjectspythonProjectbs4__init__.py", line 320, in
object_was_parsed
previous_element = most_recent_element or self._most_recent_element
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1001, in __getattr__
return self.find(tag)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1238, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1259, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 516, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1560, in __init__
self.text = self._normalize_search_value(text)
File "C:UsersThomasPycharmProjectspythonProjectbs4element.py", line 1565, in _
normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value,
'match')
AttributeError: module 'collections' has no attribute 'Callable'
Process finished with exit code 1
The lines being referred to in the error messages are from files inside bs4 that were downloaded as part of it. I haven’t edited any of the bs4 contained files or even touched them. Can anyone help me figure out why bs4 isn’t working?
Are you using python 3.10? Looks like beautifulsoup library is using removed deprecated aliases to Collections Abstract Base Classes. More info here: https://docs.python.org/3/whatsnew/3.10.html#removed
A quick fix is to paste these 2 lines just below your imports:
import collections
collections.Callable = collections.abc.Callable
Andrey, i cannot comment yet. But i tried your fix and Im using Thonny and using 3.10 in terminal. But after adding the two import collections and callable lines. i get another error in Thonny that isnt shown in terminal. when i run the program in terminal it simply seems to do nothing. In Thonny it suggests that "Module has no attribute "Callable"