Replace 'class' attribute with 'href' in Python

Question:

I am trying to make some changes to an HTML code via a python script I am writing. I have been struggling to do a simple replacement the last few days, without any success.

<a class="PageNo">1</a> —–> <a href="#PageNo1">1</a>

<a class="PageNo">2</a> —–> <a href="#PageNo2">2</a>

<a class="PageNo">12</a> —–> <a href="#PageNo12">12</a>

<a class="PageNo">20</a> —–> <a href="#PageNo20">20</a>

I simply can’t replace the "a class" with the "a href". I’ve tried something like that
html_content = html_content.replace("a class", "a href") or to do the replacement via BeautifulSoup but with no success and I couldn’t find anything similar on StackOverflow as well.

Any ideas?

Asked By: taleporos

||

Answers:

Here is a solution:

from bs4 import BeautifulSoup

s = """
<a class="PageNo">1</a>
<a class="PageNo">2</a>
<div>
    <a class="PageNo">25</a>
</div>
"""

soup = BeautifulSoup(s, 'html.parser')

for a in soup.select("a"):
    content = a.contents[0]
    del a.attrs['class']
    a.attrs['href'] = f"#PageNo{content}"

Output:

<a href="#PageNo1">1</a>
<a href="#PageNo2">2</a>
<div>
    <a href="#PageNo25">25</a>
</div>
Answered By: Tom McLean
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.