python .replace() regex


I am trying to do a grab everything after the '</html>' tag and delete it, but my code doesn’t seem to be doing anything. Does .replace() not support regex?

z.write(article.replace('</html>.+', '</html>'))
Asked By: user1442957



No. Regular expressions in Python are handled by the re module.

article = re.sub(r'(?is)</html>.+', '</html>', article)

In general:

text_after = re.sub(regex_search_term, regex_replacement, text_before)

You can use the re module for regexes, but regexes are probably overkill for what you want. I might try something like

z.write(article[:article.index("</html>") + 7]

This is much cleaner, and should be much faster than a regex based solution.

Answered By: Julian

In order to replace text using regular expression use the re.sub function:

sub(pattern, repl, string[, count, flags])

It will replace non-everlaping instances of pattern by the text passed as string. If you need to analyze the match to extract information about specific group captures, for instance, you can pass a function to the string argument. more info here.


>>> import re
>>> re.sub(r'a', 'b', 'banana')

>>> re.sub(r'/d+', '/{id}', '/andre/23/abobora/43435')
Answered By: Andre Pena

For this particular case, if using re module is overkill, how about using split (or rsplit) method as


For example,


Ponta Monta 
Waff Moff


outputs out.txt as

Ponta Monta 
Answered By: norio
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.