Difference between "findAll" and "find_all" in BeautifulSoup

Question:

I would like to parse an HTML file with Python, and the module I am using is BeautifulSoup.

It is said that the function find_all is the same as findAll. I’ve tried both of them, but I believe they are different:

import urllib, urllib2, cookielib
from BeautifulSoup import *
site = "http://share.dmhy.org/topics/list?keyword=TARI+TARI+team_id%3A407"

rqstr = urllib2.Request(site)
rq = urllib2.urlopen(rqstr)
fchData = rq.read()

soup = BeautifulSoup(fchData)

t = soup.findAll('tr')

Can anyone tell me the difference?

Asked By: Oberon

||

Answers:

In BeautifulSoup version 4, the methods are exactly the same; the mixed-case versions (findAll, findAllNext, nextSibling, etc.) have all been renamed to conform to the Python style guide, but the old names are still available to make porting easier. See Method Names for a full list.

In new code, you should use the lowercase versions, so find_all, etc.

In your example however, you are using BeautifulSoup version 3 (discontinued since March 2012, don’t use it if you can help it), where only findAll() is available. Unknown attribute names (such as .find_all, which only is available in BeautifulSoup 4) are treated as if you are searching for a tag by that name. There is no <find_all> tag in your document, so None is returned for that.

Answered By: Martijn Pieters

from the source code of BeautifulSoup:

http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/element.py#L1260

def find_all(self, name=None, attrs={}, recursive=True, text=None,
                 limit=None, **kwargs):
# ...
# ...

findAll = find_all       # BS3
findChildren = find_all  # BS2
Answered By: kmonsoor