jquery-like HTML parsing in Python?

Question

Is there any way in Python that would allow me to parse an HTML document similar to what jQuery does?

i.e. I’d like to be able to use CSS selectors syntax to grab an arbitrary set of nodes from the document, read their content/attributes, etc.

Asked By: Roy Tang

||

Source

Answer 1

The lxml library supports CSS selectors.

Answered By: Ignacio Vazquez-Abrams

Answer 2

If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
Soupselect is a CSS selector extension for BeautifulSoup.

Usage:

from bs4 import BeautifulSoup as Soup
from soupselect import select
import urllib
soup = Soup(urllib.urlopen('http://slashdot.org/'))
select(soup, 'div.title h3')

    [<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,
     <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,
    ..]

Answered By: systempuntoout

Answer 3

Consider PyQuery:

http://packages.python.org/pyquery/

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url='http://google.com/')
>>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read())
>>> d = pq(filename=path_to_html_file)
>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> p.html()
'Hello world !'
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> p.html()
u'you know <a href="http://python.org/">Python</a> rocks'
>>> p.text()
'you know Python rocks'

Answered By: Luke Stanley

Answer 4

BeautifulSoup, now has support for `css selectors`

import requests
from bs4 import BeautifulSoup as Soup
html = requests.get('https://stackoverflow.com/questions/3051295').content
soup = Soup(html)

Title of this question

soup.select('h1.grid--cell :first-child')[0].text

Number of question upvotes

# first item 
soup.select_one('[itemprop="upvoteCount"]').text

using Python Requests to get the html page

Answered By: imbr

jquery-like HTML parsing in Python?

Question:

Answers:

BeautifulSoup, now has support for `css selectors`

jquery-like HTML parsing in Python?

Question:

Answers:

BeautifulSoup, now has support for css selectors

BeautifulSoup, now has support for `css selectors`