How to select one by one the element in web scraping using python

Question:

I want only h3[0] and h6[1], for example.

<div class="span16">
    <h3>Shroot, Stephanie</h3>
    <h6>Chemistry</h6>
    <h6>December 2021</h6>
    <p>Thesis or dissertation
    <h3>Shroot</h3>

i use BeautifulSoup, and for loop to get information

url = line.strip() 
r_html = requests.get(url, headers=headers).text
r_html_sc = requests.get(url, headers=headers).status_code 
soup = BeautifulSoup(r_html, "html.parser") 
thesis_infos = soup.find('div',{"class":"span16"}) 
if thesis_infos is not None: 
thesis_infos_text = thesis_infos.text.strip() 
else: thesis_infos_1 = " " 
print(thesis_infos_text) 
thesis_infos_lines = thesis_infos_text.readlines() 
author1_1 = thesis_infos_lines[0] 
year1_1 = thesis_infos_lines[2] 
Asked By: M. Zain Aldin

||

Answers:

Edit:
The easiest way is probably to use BeautifulSoup, like so:

soup.find_all("h3")[0]
soup.find_all("h6")[1]

Here is a short example, filtering for links on google.com:

import requests as requests
from bs4 import BeautifulSoup

html = requests.get("https://www.google.com").text
soup = BeautifulSoup(html, "html.parser")
links = soup.findAll("a")
print(links[0])

Is this what you are looking for?

import re

code = """
<div class="span16">
    <h3>Shroot, Stephanie</h3>
    <h6>Chemistry</h6>
    <h6>December 2021</h6>
    <p>Thesis or dissertation
    <h3>Shroot</h3>
"""

h3_matches = re.findall(".*<h3>(.+)<\/h3>", code)
h6_matches = re.findall(".*<h6>(.+)<\/h6>", code)
print(h3_matches[0])
print(h6_matches[1])

output:

Shroot, Stephanie
December 2021
Answered By: H.Syd
    thesis_infos = soup.find('div',{"class":"span16"})
    code = str(thesis_infos)
    h3_matches = re.findall(".*<h3>(.+)<\/h3>", code)
    h6_matches = re.findall(".*<h6>(.+)<\/h6>", code)
    print(h3_matches[0])
    print(h6_matches[1])
Answered By: M. Zain Aldin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.