get text from a class of a class with beautifulsoup

Question:

I am a new with beautifulsoup, I usually do web scrapping with scrapy which uses response.xpath to get the text.

This time, I want to get the article news from a class called article-title and the pubslished date from a class called meta-posted

The html is look like this:

<div class="col-12 col-md-8">
  <article class="article-main">
    <header class="article-header">
       <h1 class="article-title" style="font-size: 28px !important; font-family: sans-serif !important;">Presentation: Govt pushes CCS/CCUS development in RI upstream sector</h1>
       <div class="article-meta">
         <span class="meta-posted">
                    Monday, August 1 2022 - 04:27PM WIB </span>
       </div>

To get the title, what I have tried is:

title= res.findAll('h1', attrs={'class':'article-title'})

But it still gives me:

[<h1 class="article-title" style="font-size: 28px !important; font-family: sans-serif !important;">Pertagas, Chandra Asri sign gas MoU</h1>]

while to get the date:

date = res.findAll('span', attrs={'class':'meta-posted'})

But it gives me:

[<span class="meta-posted" style="font-size: large">
 </span>,
 <span class="meta-posted" style="font-style: italic">
 </span>,
 <span class="meta-posted">
                     Tuesday, August 2 2022 - 10:53AM WIB
                 </span>]

How should I write the code in order to get only the title and also the date?

Asked By: yangyang

||

Answers:

This should fix your problem.

soup = BeautifulSoup(html_doc, 'html.parser')

titles= soup.findAll('h1', attrs={'class':'article-title'})
for title in titles:
    print(title.get_text())
    
dates = soup.findAll('span', attrs={'class':'meta-posted'})

for date in dates:
    print(date.get_text())
Answered By: msvstl
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.