How can I get the html tags from an input rather than the text?

Question:

I’m trying to make a program that takes input and then outputs the HTML tags. Although I’ve managed to do the opposite.

import re

text = '<p>I want this bit removed</p>'
tags = re.search('>(.*)<', text)

print(tags.group(1))

At the moment, if I run this, it removes the HTML tags and keeps the text. But I want it so that the output is ['p','/p']. How can I do this? I also want to make it so that it can adapt to any input.

Also, if possible, I’d like to adapt this to a for loop

Asked By: JJ42

||

Answers:

Just change the regex to look for the text inside the < > instead.

import re

text = '<p>I want this bit removed</p>'
tags = re.findall('<([^>]*)>', text) # [^>] means anything except a `>`

print(tags) # tags is an iterable object (basically a list) here
Answered By: imbuedHope
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.