Replace 'class' attribute with 'href' in Python
Question:
I am trying to make some changes to an HTML code via a python script I am writing. I have been struggling to do a simple replacement the last few days, without any success.
<a class="PageNo">1</a>
—–> <a href="#PageNo1">1</a>
<a class="PageNo">2</a>
—–> <a href="#PageNo2">2</a>
<a class="PageNo">12</a>
—–> <a href="#PageNo12">12</a>
<a class="PageNo">20</a>
—–> <a href="#PageNo20">20</a>
I simply can’t replace the "a class" with the "a href". I’ve tried something like that
html_content = html_content.replace("a class", "a href")
or to do the replacement via BeautifulSoup but with no success and I couldn’t find anything similar on StackOverflow as well.
Any ideas?
Answers:
Here is a solution:
from bs4 import BeautifulSoup
s = """
<a class="PageNo">1</a>
<a class="PageNo">2</a>
<div>
<a class="PageNo">25</a>
</div>
"""
soup = BeautifulSoup(s, 'html.parser')
for a in soup.select("a"):
content = a.contents[0]
del a.attrs['class']
a.attrs['href'] = f"#PageNo{content}"
Output:
<a href="#PageNo1">1</a>
<a href="#PageNo2">2</a>
<div>
<a href="#PageNo25">25</a>
</div>
I am trying to make some changes to an HTML code via a python script I am writing. I have been struggling to do a simple replacement the last few days, without any success.
<a class="PageNo">1</a>
—–> <a href="#PageNo1">1</a>
<a class="PageNo">2</a>
—–> <a href="#PageNo2">2</a>
<a class="PageNo">12</a>
—–> <a href="#PageNo12">12</a>
<a class="PageNo">20</a>
—–> <a href="#PageNo20">20</a>
I simply can’t replace the "a class" with the "a href". I’ve tried something like that
html_content = html_content.replace("a class", "a href")
or to do the replacement via BeautifulSoup but with no success and I couldn’t find anything similar on StackOverflow as well.
Any ideas?
Here is a solution:
from bs4 import BeautifulSoup
s = """
<a class="PageNo">1</a>
<a class="PageNo">2</a>
<div>
<a class="PageNo">25</a>
</div>
"""
soup = BeautifulSoup(s, 'html.parser')
for a in soup.select("a"):
content = a.contents[0]
del a.attrs['class']
a.attrs['href'] = f"#PageNo{content}"
Output:
<a href="#PageNo1">1</a>
<a href="#PageNo2">2</a>
<div>
<a href="#PageNo25">25</a>
</div>