How can I get the html tags from an input rather than the text?

Question

I’m trying to make a program that takes input and then outputs the HTML tags. Although I’ve managed to do the opposite.

import re

text = '<p>I want this bit removed</p>'
tags = re.search('>(.*)<', text)

print(tags.group(1))

At the moment, if I run this, it removes the HTML tags and keeps the text. But I want it so that the output is ['p','/p']. How can I do this? I also want to make it so that it can adapt to any input.

Also, if possible, I’d like to adapt this to a for loop

Asked By: JJ42

||

Source

Answer 1

Just change the regex to look for the text inside the < > instead.

import re

text = '<p>I want this bit removed</p>'
tags = re.findall('<([^>]*)>', text) # [^>] means anything except a `>`

print(tags) # tags is an iterable object (basically a list) here

Answered By: imbuedHope

How can I get the html tags from an input rather than the text?

Question:

Answers: