How to use beautiful soup to return data inside specified tags and separated by delimited?
Question:
This is how the data looks in html format.
<div class="leader-info"><h4>Director of IR</h4><p>Diane PHILIPS</p></div>,
<div class="leader-info"><h4>Director of Finance</h4><p>Nancy LOPEZ</p></div>,
<div class="leader-info"><h4>Director of HR</h4><p>George SANTOZ</p></div>
I used the below code to extract the text.
for leader_list in soup.findAll(attrs={'class':'leader-info'}):
print(leader_list.get_text())
This is what I get.
Director of IRDiane PHILIPS
Director of FinanceNancy LOPEZ
Director of HRGeorge SANTOZ
My question is how to place a pipe delimited in between "h4" and "p" tag texts?
Director of IR|Diane PHILIPS
Director of Finance|Nancy LOPEZ
Director of HR|George SANTOZ
Answers:
Use .get_text()
with separator=
parameter:
from bs4 import BeautifulSoup
html_doc = '''
<div class="leader-info"><h4>Director of IR</h4><p>Diane PHILIPS</p></div>
<div class="leader-info"><h4>Director of Finance</h4><p>Nancy LOPEZ</p></div>
<div class="leader-info"><h4>Director of HR</h4><p>George SANTOZ</p></div>'''
soup = BeautifulSoup(html_doc, 'html.parser')
for i in soup.select('.leader-info'):
print(i.get_text(strip=True, separator='|'))
Prints:
Director of IR|Diane PHILIPS
Director of Finance|Nancy LOPEZ
Director of HR|George SANTOZ
This is how the data looks in html format.
<div class="leader-info"><h4>Director of IR</h4><p>Diane PHILIPS</p></div>,
<div class="leader-info"><h4>Director of Finance</h4><p>Nancy LOPEZ</p></div>,
<div class="leader-info"><h4>Director of HR</h4><p>George SANTOZ</p></div>
I used the below code to extract the text.
for leader_list in soup.findAll(attrs={'class':'leader-info'}):
print(leader_list.get_text())
This is what I get.
Director of IRDiane PHILIPS
Director of FinanceNancy LOPEZ
Director of HRGeorge SANTOZ
My question is how to place a pipe delimited in between "h4" and "p" tag texts?
Director of IR|Diane PHILIPS
Director of Finance|Nancy LOPEZ
Director of HR|George SANTOZ
Use .get_text()
with separator=
parameter:
from bs4 import BeautifulSoup
html_doc = '''
<div class="leader-info"><h4>Director of IR</h4><p>Diane PHILIPS</p></div>
<div class="leader-info"><h4>Director of Finance</h4><p>Nancy LOPEZ</p></div>
<div class="leader-info"><h4>Director of HR</h4><p>George SANTOZ</p></div>'''
soup = BeautifulSoup(html_doc, 'html.parser')
for i in soup.select('.leader-info'):
print(i.get_text(strip=True, separator='|'))
Prints:
Director of IR|Diane PHILIPS
Director of Finance|Nancy LOPEZ
Director of HR|George SANTOZ