How can you group a very specfic pattern with regex?

Question:

Problem:

https://coderbyte.com/editor/Simple%20Symbols

The str parameter will be composed of + and = symbols with
several letters between them (ie. ++d+===+c++==a) and for the string
to be true each letter must be surrounded by a + symbol. So the string
to the left would be false. The string will not be empty and will have
at least one letter.

Input:"+d+=3=+s+"

Output:"true"

Input:"f++d+"

Output:"false"

I’m trying to create a regular expression for the following problem, but I keep running into various problems. How can I produce something that returns the specified rules(‘+D+’)?

import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []

Here I thought I could create my own class that searches for the pattern.

import re
plusReg = re.compile(r'([\+,D,\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']

Here I thought I the ‘\+’ would single out the plus symbol and read it as a char.

mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'

Here using the same shell, I tried using the search instead of findall, but I just ended up with the first letter which isn’t even surrounded by a plus.

My end result is to group the string ‘adf+a+=4=+S+’ into [‘+a+’,’+S+’] and so on.

Asked By: Allen Birmingham

||

Answers:

Something like this should do the trick:

import re

def is_valid_str(s):
  return re.findall('[a-zA-Z]', s) == re.findall('+([a-zA-Z])+', s)

Usage:

In [10]: is_valid_str("f++d+")
Out[10]: False

In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True
Answered By: Jack

I think you are on the right track. The regular expression you have is correct, but it can simplify down to just letters:

search_pattern = re.compile(r'+[a-zA-z]+')

for upper and lower case strings. Now we can use this regex with the findall function:

results = re.findall(search_pattern, 'adf+a+=4=+S+')  # returns ['+a+', '+S+']

Now the question needs you to return a boolean depending on if the string is valid to the specified pattern so we can wrap this all up into a function:

def is_valid_pattern(pattern_string):
    search_pattern =  re.compile(r'+[a-zA-z]?+')
    letter_pattern = re.compile(r'[a-zA-z]')  # to search for all letters
    results = re.findall(search_pattern, pattern_string)
    letters = re.findall(letter_pattern, pattern_string)
    # if the lenght of the list of all the letters equals the length of all
    # the values found with the pattern, we can say that it is a valid string
    return len(results) == len(letter_pattern)
Answered By: Lucas Currah

You should be looking for what isn’t there, as opposed to what is. You should search for something like, ([^+][A-Za-z]|[A-Za-z][^+]). The | in the middle is a logical or operator. Then on either side, it checks if it can find any scenario where there is a letter without a “+” on the left/right respectively. If if finds something, that means the string fails. If it can’t find anything, that means that there are no instances of a letter not being surrounded by “+”‘s.

Answered By: user6732861

One approach is to search the string for any letters that are either: (1) not preceeded by a +, or (2) not followed by a +. This can be done using look ahead and look behind assertions:

>>> rgx = re.compile(r'(?<!+)[a-zA-Z]|[a-zA-Z](?!+)')

So if rgx.search(string) returns None, the string is valid:

>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True

but if it returns a match, the string is invalid:

>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False

The important thing about look ahead/behind assertions is that they don’t consume characters, so they can handle overlapping matches.

Answered By: ekhumoro
import re
def SimpleSymbols(str): 
    #added padding, because if str = 'y+4==+r+'
    #then program would return true when it should return false. 
    string = '=' + str + '=' 
    #regex that returns false if a letter *doesn't* have a + in front or back
    plusReg = re.compile(r'[^+][A-Za-z].|.[A-Za-z][^+]')
    #if statement that returns "true" if regex doesn't find any letters
    #without a + behind or in front
    if plusReg.search(string) is None:
        return "true"
    return "false"

print SimpleSymbols(raw_input())

I borrowed some code from ekhumoro and Sanjay.


This answer was posted as an edit to the question How can you group a very specfic pattern with regex? by the OP Allen Birmingham under CC BY-SA 3.0.

Answered By: vvvvv
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.