To to remove html tag to get text

Question:

I have text like this:

text = 
<option value="tfa_4472" id="tfa_4472" class="">helo 1</option>
<option value="tfa_4473" id="tfa_4473" class="">helo 2</option>
<option value="tfa_4474" id="tfa_4474" class="">helo 3</option>
<option value="tfa_4475" id="tfa_4475" class="">helo 4</option>
<option value="tfa_4476" id="tfa_4476" class="">helo 5</option>

i want get result like this:
my_list = get_text(text)

helo 1
helo 2
helo 3
helo 4
helo 5

Thank you

To to remove html tag to get text

Asked By: Hoàng Tuyến

||

Answers:

In JavaScript, if your text comes to you as a string, you can search "strip HTML tags" on Google and get a regular expression like this one from css-tricks and wrap it into a function with the name that you need:

const get_text(text) = () => {
    return text.replace(/(<([^>]+)>)/gi, "");
}
Answered By: Matvey Andreyev

In javascript you can try to select the option tags with queryselectorall and get the text with innerText by looping over the nodes and appending to myList.

$mylist = []
$nodes = document.querySelectorAll('option')

$nodes.forEach($node => {
    $mylist += $node.innerText
});

console.log($mylist)
Answered By: Joel

Python:

from bs4 import BeautifulSoup


myhtml = """<option value="tfa_4472" id="tfa_4472" class="">helo 1</option>
<option value="tfa_4473" id="tfa_4473" class="">helo 2</option>
<option value="tfa_4474" id="tfa_4474" class="">helo 3</option>
<option value="tfa_4475" id="tfa_4475" class="">helo 4</option>
<option value="tfa_4476" id="tfa_4476" class="">helo 5</option>"""


soup = BeautifulSoup(myhtml, 'html.parser')

my_text = []
for text_tag in  soup.find_all("option", {'class': ''}):
    my_text.append(text_tag.getText()) 

my_text
[‘helo 1’, ‘helo 2’, ‘helo 3’, ‘helo 4’, ‘helo 5’]

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.