How to scrape football results from livescores?

Question:

I have this project am working on using python 3.4. I want to scrape livescore.com for football scores (result) e.g getting all the scores of the day (England 2-2 Norway, France 2-1 Italy, etc). I am building it with python 3.4, windows 10 64bit os.

I have tried two ways this are the codes:

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('http://www.livescore.com/').read()
soup = bs.BeautifulSoup(sauce,'lxml')

for div in soup.find_all('div', class_='container'):
    print(div.text)

When I run this code a box pup’s up saying:

IDLE’s subprocess didn’t make connection. Either IDLE can’t start a subprocess or firewall software is blocking the connection.

I decided to write another one this is the code:

# Import Modules
import urllib.request
import re

# Downloading Live Score XML Code From Website and reading also
xml_data = urllib.request.urlopen('http://static.cricinfo.com/rss/livescores.xml').read()

# Pattern For Searching Score and link
pattern = "<item>(.*?)</item>"

# Finding Matches
for i in re.findall(pattern, xml_data, re.DOTALL):
    result = re.split('<.+?>',i)
    print (result[1], result[3]) # Print Score

And I got this error:

Traceback (most recent call last):
  File "C:UsersBrightDesktoplive_score.py", line 12, in <module>
   for i in re.findall(pattern, xml_data, re.DOTALL):
  File "C:Python34libre.py", line 206, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
Asked By: Bright

||

Answers:

On your first example – the site is loading its content by heavy javascript so I suggest using selenium in fetching the source.

Your code should look like this:

import bs4 as bs
from selenium import webdriver

url = 'http://www.livescore.com/'
browser = webdriver.Chrome()
browser.get(url)
sauce = browser.page_source
browser.quit()
soup = bs.BeautifulSoup(sauce,'lxml')

for div in soup.find('div', attrs={'data-type': 'container'}).find_all('div'):
    print(div.text)

For the second example, it regular expression engine returns an error because the read() function from your requests gives byte data type, “re” only accepts strings or unicode. So you just t have toypecast xml_data to str.

This is the modified code:

for i in re.findall(pattern, str(xml_data), re.DOTALL):
    result = re.split('<.+?>',i)
    print (result[1], result[3]) # Print Score
Answered By: chad

How can I find the XML data link that is used for xml_data variable. The one used is for cricket however I need the football link. Anyhelp would be much appreciated.

Answered By: George