python .replace() regex

Question:

I am trying to do a grab everything after the '</html>' tag and delete it, but my code doesn’t seem to be doing anything. Does .replace() not support regex?

z.write(article.replace('</html>.+', '</html>'))
Asked By: user1442957

||

Answers:

No. Regular expressions in Python are handled by the re module.

article = re.sub(r'(?is)</html>.+', '</html>', article)

In general:

text_after = re.sub(regex_search_term, regex_replacement, text_before)

You can use the re module for regexes, but regexes are probably overkill for what you want. I might try something like

z.write(article[:article.index("</html>") + 7]

This is much cleaner, and should be much faster than a regex based solution.

Answered By: Julian

In order to replace text using regular expression use the re.sub function:

sub(pattern, repl, string[, count, flags])

It will replace non-everlaping instances of pattern by the text passed as string. If you need to analyze the match to extract information about specific group captures, for instance, you can pass a function to the string argument. more info here.

Examples

>>> import re
>>> re.sub(r'a', 'b', 'banana')
'bbnbnb'

>>> re.sub(r'/d+', '/{id}', '/andre/23/abobora/43435')
'/andre/{id}/abobora/{id}'
Answered By: Andre Pena

For this particular case, if using re module is overkill, how about using split (or rsplit) method as

se='</html>'
z.write(article.split(se)[0]+se)

For example,

#!/usr/bin/python

article='''<html>Larala
Ponta Monta 
</html>Kurimon
Waff Moff
'''
z=open('out.txt','w')

se='</html>'
z.write(article.split(se)[0]+se)

outputs out.txt as

<html>Larala
Ponta Monta 
</html>
Answered By: norio
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.