why get the html content.txt is empty?

Question

The target of the program is simple to get the headline of tageschau.de.
It normal at first, but it can get nothing after a few runs.

import requests
from bs4 import BeautifulSoup

headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
                          'AppleWebKit/537.36 (KHTML, like Gecko)'
                          'Chrome/86.0.4240.111 Safari/537.36',
            'Host': 'www.tagesschau.de',
            'Referer': 'https://www.tagesschau.de/'
          }

# get and parse the HTML of tageschau.de
URL = 'https://www.tagesschau.de/'
html = requests.get(URL, headers=headers)
html_parse = BeautifulSoup(html.content, 'lxml')

# find all headline in homepage
elements = html_parse.find_all('h4',{'class':'headline'})
for element in elements:
    print(element.txt)

It got nothing.

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None

But when I use element instead of element.txt, there are some right output

<h4 class="headline"><a href="/multimedia/livestreams/livestream3/">Live: tagesschau24</a></h4>
<h4 class="headline"><a href="/100sekunden/">100 Sekunden</a></h4>
<h4 class="headline"><a href="/multimedia/sendung/ts-39833.html">tagesschau 20 Uhr</a></h4>
<h4 class="headline"><a href="/multimedia/sendung/ts-39841.html">Letzte Sendung</a></h4>
<h4 class="headline">++ Fauci warnt vor "einer Menge Leid" ++</h4>
<h4 class="headline">Weniger Party, mehr Wellness</h4>
<h4 class="headline">November-Lockdown kostet 19 Milliarden</h4>

It makes me so confused, why?

Asked By: SILIN YANG

||

Source

Answer 1

If you want to get the innertext of element try .text:

for element in elements:
    print(element.text)

For innerHTML use .html:

for element in elements:
    print(element.html)

Answered By: Wasif

why get the html content.txt is empty?

Question:

Answers: