Python multiple repeat Error

Question:

I’m trying to determine whether a term appears in a string.
Before and after the term must appear a space, and a standard suffix is also allowed.
Example:

term: google
string: "I love google!!! "
result: found

term: dog
string: "I love dogs "
result: found

I’m trying the following code:

regexPart1 = "s"
regexPart2 = "(?:s|'s|!+|,|.|;|:|(|)|"|?+)?s"  
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)

and get the error:

raise error("multiple repeat")
sre_constants.error: multiple repeat

Update
Real code that fails:

term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"s"
regexPart2 = r"(?:s|'s|!+|,|.|;|:|(|)|"|?+)?s" 
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)

On the other hand, the following term passes smoothly (+ instead of ++)

term = 'lg incite" OR author:"http+www.dealitem.com" OR "for sale'
Asked By: Presen

||

Answers:

The problem is that, in a non-raw string, " is ".

You get lucky with all of your other unescaped backslashes—s is the same as \s, not s; ( is the same as \(, not (, and so on. But you should never rely on getting lucky, or assuming that you know the whole list of Python escape sequences by heart.

Either print out your string and escape the backslashes that get lost (bad), escape all of your backslashes (OK), or just use raw strings in the first place (best).


That being said, your regexp as posted won’t match some expressions that it should, but it will never raise that "multiple repeat" error. Clearly, your actual code is different from the code you’ve shown us, and it’s impossible to debug code we can’t see.


Now that you’ve shown a real reproducible test case, that’s a separate problem.

You’re searching for terms that may have special regexp characters in them, like this:

term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'

That p++ in the middle of a regexp means “1 or more of 1 or more of the letter p” (in the others, the same as “1 or more of the letter p”) in some regexp languages, “always fail” in others, and “raise an exception” in others. Python’s re falls into the last group. In fact, you can test this in isolation:

>>> re.compile('p++')
error: multiple repeat

If you want to put random strings into a regexp, you need to call re.escape on them.


One more problem (thanks to Ωmega):

. in a regexp means “any character”. So, ,|.|;|:" (I’ve just extracted a short fragment of your longer alternation chain) means “a comma, or any character, or a semicolon, or a colon”… which is the same as “any character”. You probably wanted to escape the ..


Putting all three fixes together:

term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"s"
regexPart2 = r"(?:s|'s|!+|,|.|;|:|(|)|"|?+)?s"  
p = re.compile(regexPart1 + re.escape(term) + regexPart2 , re.IGNORECASE)

As Ωmega also pointed out in a comment, you don’t need to use a chain of alternations if they’re all one character long; a character class will do just as well, more concisely and more readably.

And I’m sure there are other ways this could be improved.

Answered By: abarnert

The other answer is great, but I would like to point out that using regular expressions to find strings in other strings is not the best way to go about it. In python simply write:

    if term in string:
         #do whatever
Answered By: Patrick

Also make sure that your arguments are in the correct order!

I was trying to run a regular expression on some html code. I kept getting the multiple repeat error, even with very simple patterns of just a few letters.

Turns out I had the pattern and the html mixed up. I tried re.findall(html, pattern) instead of re.findall(pattern, html).

Answered By: Roald

i have an example_str = "i love you c++" when using regex get error multiple repeat Error. The error I’m getting here is because the string contains "++" which is equivalent to the special characters used in the regex. my fix was to use re.escape(example_str ), here is my code.

example_str = "i love you c++" 

regex_word = re.search(rf'b{re.escape(word_filter)}b', word_en)
Answered By: sonpxp

A general solution to "multiple repeat" is using re.escape to match the literal pattern.
Example:

>>>> re.compile(re.escape("c++"))
re.compile('c\+\+')

However if you want to match a literal word with space before and after try out this example:

>>>> re.findall(rf"s{re.escape('c++')}s", "i love c++ you c++")
[' c++ ']
Answered By: LazerDance
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.