Scraping HTML site in h3 tags

Question:

import requests
from bs4 import BeautifulSoup

url = 'http://www.columbia.edu/~fdc/sample.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.findAll('h3')
print(items)

I get this conclusion:
[<h3 id="contents">CONTENTS</h3>, <h3 id="basics">1. Creating a Web Page</h3>, <h3 id="syntax">2. HTML Syntax</h3>
How can I get this output?
[CONTENTS, 1. Creating a Web Page, 2. HTML Syntax…

Asked By: 3Vw

||

Answers:

If you are looking for a list of the text inside the h3 tags you can iterate over all the h3 tags and only save the text.

import requests
from bs4 import BeautifulSoup

url = 'http://www.columbia.edu/~fdc/sample.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = [h3.text for h3 in soup.findAll('h3')]
print(items)

The Output:

['CONTENTS', '1. Creating a Web Page', '2. HTML Syntax', '3. Special Characters', '4. Converting Plain Text to HTML', '5. Effects', '6. Lists', '7. Links', '8. Tables', '9. Viewing Your Web Page', '10. Installing Your Web Page on the Internet', '11. Where to go from here', '12. Postscript: Cell Phones']
Answered By: E Joseph
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.