Error using BeautifulSoup

Question:

I want to extract some data from a website. I saved it as ‘Webpage, HTML Only’, in a file called soccerway.html on my Desktop.

Afterwards I wrote the following command using an IPython notebook:

from bs4 import BeautifulSoup
soup=BeautifulSoup(open("soccerway.html"))

I get the following error:

IOError: [Errno 2] No such file or directory: 'soccerway.html'

How can I solve this?

Asked By: user3486076

||

Answers:

You don’t need to manually save a page. Use urllib2 to get the html source you need:

from bs4 import BeautifulSoup
from urllib2 import urlopen

soup = BeautifulSoup(urlopen("http://my_site.com/mypage"))

Example:

>>> from bs4 import BeautifulSoup
>>> from urllib2 import urlopen
>>> soup = BeautifulSoup(urlopen('http://google.com'))
>>> soup('a')
[<a class="gb1" href="http://www.google.com/imghp?hl=en&amp;tab=wi">Images</a>, 
 ...
]
Answered By: alecxe

You can use this code:

from bs4 import BeautifulSoup

file = open("yourfile.html", "r")

soup = BeautifulSoup(file, "html.parser")
Answered By: tofi1130