Explain sum(int(td.text) for td in soup.select('td:last-child')[1:])

Question:

I came across this piece of code during solving a problem. I just cannot understand how the last line of the code before the print functions. Please explain.

import re
import urllib.request
from bs4 import BeautifulSoup

# url = 'http://py4e-data.dr-chuck.net/comments_42.html'
url = 'http://py4e-data.dr-chuck.net/comments_228869.html'

soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html.parser')
s = sum(int(td.text) for td in soup.select('td:last-child')[1:])

print(s)
Asked By: Raju Ritigya

||

Answers:

This is the order of operations:

  • soup.select('td:last-child') is a method that returns a list of selected elements
  • [1:] is a slicing operation – it creates a new list that skips the first (zero’th) item in the list
  • for td in is a loop where the items of the list are assigned to td in turn
  • int(td.text) takes the "text" attribute of the object in td and then creates its integer equivalent
  • sum() sums those integers as they are generated
Answered By: tdelaney

You can break down the following assignment…

s = sum(int(td.text) for td in soup.select('td:last-child')[1:])

…into several statements:

all_td = soup.select('td:last-child') # get all last TD elements in each TR
rest_td = all_td[1:]  # skip the first TD among those
s = 0  # for accumulating a sum
for td in rest_td:
    val = int(td.text)  # parse the text in the TD as an integer
    s += val  # add that number to the running sum

Now you can step through these statements with a debugger, or add some print calls here and there, to see what’s going on.

Answered By: trincot