Get ALL strings from html
Question:
I send get requests to different sites. In response I get HTML pages. How can I get only strings from the HTML page?
I mean all strings in general (the ones colored white in my screenshot).

I understand how I can get "div", "code", "a", and etc tags.
But I need to get all the lines that are painted white in the developer tools.
Answers:
To get all the human readable text of the HTML <body>
you can use .get_text()
, to get rid of redundant whitespaces, etc. set strip parameter and join/separate all by a single whitespace:
import bs4
soup = bs4.BeautifulSoup(response.text)
soup.body.get_text(' ', strip=True)
I send get requests to different sites. In response I get HTML pages. How can I get only strings from the HTML page?
I mean all strings in general (the ones colored white in my screenshot).
I understand how I can get "div", "code", "a", and etc tags.
But I need to get all the lines that are painted white in the developer tools.
To get all the human readable text of the HTML <body>
you can use .get_text()
, to get rid of redundant whitespaces, etc. set strip parameter and join/separate all by a single whitespace:
import bs4
soup = bs4.BeautifulSoup(response.text)
soup.body.get_text(' ', strip=True)