Extract all text within a tag & save to dictionary using beautifulSoup

Question:

I have an xml file that looks a bit like this:

<article id = '1'> 
  <p> This is </p> 
  <p> example A </p>
</article>

<article id = '2'> 
  <p> This is </p> 
  <p> example B </p>
</article>

I would like to create a dictionary that looks like this:

{1: 'This is example A', 2: 'This is example B'}

with the keys being the ‘id’ in the tag. What is the best way to go about doing this using beautiful soup?

Asked By: Dieu94

||

Answers:

This is how I will do it:

from bs4 import BeautifulSoup


output = {}

# If you're getting your XML file from the web skip this step:
with open("xml_file.xml", mode="r") as f:
    data = f.read()

soup = BeautifulSoup(data)
articles = soup.find_all('article')

for i in range(len(articles)):
    output[i+1] = ' '.join(articles[i].text.replace('n', '').split())

Hope this helps!

Answered By: Kawish Qayyum
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.