create a regex that knows balanced parenthesis with maximum depth of 5

Question:

I have a problem with a regex, it is supposed to match if the depth is between 1 or 5, for example it should match with “()()()”, “((((()))))”, “(()((()))())” and not match with “())()”, “(((((())))))” och “(x)”.
I have this

    pattern = '^(?:()|(((?:()|[^()])*))){0,5}$'
    return pattern

-waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Asked By: alex

||

Answers:

Here are two solutions for balanced < and > instead, as that’s easier to read/write:

(<>|<(<>|<(<>|<(<>|<(<>)+>)+>)+>)+>)+

(<((<((<((<((<>)+)?>)+)?>)+)?>)+)?>)+

And the same but with < and > replaced with ( and ):

(()|((()|((()|((()|((())+))+))+))+))+

(((((((((((((())+)?))+)?))+)?))+)?))+

I built them by starting with a pattern for depth 1, then using it to build a pattern for depths 1 to 2, then using that for a pattern for depths 1 to 3, and so on, up to 5:

p = r'(<>)+'
for _ in range(4):
  # p = r'(<>|<p>)+'.replace('p', p)
    p = r'(<(p)?>)+'.replace('p', p)
pattern = p.replace('<', r'(').replace('>', r')')

print(p)
print(pattern)

Testing it:

import re

good = "()()()", "((((()))))", "(()((()))())"
bad = "())()", "(((((())))))", "(x)"

for s in good:
    print(re.fullmatch(pattern, s))

for s in bad:
    print(re.fullmatch(pattern, s))

Test results (Attempt This Online!):

<re.Match object; span=(0, 6), match='()()()'>
<re.Match object; span=(0, 10), match='((((()))))'>
<re.Match object; span=(0, 12), match='(()((()))())'>
None
None
None
Answered By: Kelly Bundy

Unless it has to be a single expression, you could use multiple re.sub to replace "()" with empty strings five times over. This should result in an empty string if all parentheses match up to 5 level deep:

import re

def parMatch(S):
    oc = r"()"
    return not re.sub(oc,"",re.sub(oc,"",re.sub(oc,"",re.sub(oc,"",re.sub(oc,"",S)))))

output:

tests = ["()()()", "((((()))))", "(()((()))())","())()", "(((((())))))","(x)"]
for S in tests:
    print(S,parMatch(S))

()()() True
((((())))) True
(()((()))()) True
())() False
(((((()))))) False
(x) False

obviously, this approach does not require regular expressions at all so it’s probably not what is expected of you

If you need it to be a single expression, you could nest multiple non-capturing groups in a recurring(nested) pattern expecting a starting parenthesis, a valid pairing repeated 0-n times, followed by a closing parenthesis. Make the outer group repeatable at least once and spanning the whole string (^()+$):

def parMatch(S):
    oc = r"^(((?:((?:((?:((?:())*))*))*))*))+$"
    return bool(re.match(oc,S))

The recurring pattern is ((?:)*) which you nest within itself 5 levels deep ending with () for the innermost matching parentheses.

Note that non-capturing groups are just to avoid getting multiple entries in the match object. Since you’re only looking for True/False, and not the extracted string itself, it would probably work with capturing groups as well: r"^(((((((((())*))*))*))*))+$"

Answered By: Alain T.

A shorter pattern found a different way than in my fist answer:

With <>:  (<(<(<(<(<>)*>)*>)*>)*>)+
With ():  (((((((((())*))*))*))*))+

First I wrote a DFA (using a and b instead of parentheses):

DFA

si is the initial state, se is an error state, and s0 to s5 tell how many parentheses are currently open (0 to 5).

That image was created by FSM2Regex when I entered this DFA:

#states
si
s0
s1
s2
s3
s4
s5
se
#initial
si
#accepting
s0
#alphabet
a
b
#transitions
si:a>s1
si:b>se
s0:a>s1
s0:b>se
s1:a>s2
s1:b>s0
s2:a>s3
s2:b>s1
s3:a>s4
s3:b>s2
s4:a>s5
s4:b>s3
s5:a>se
s5:b>s4
se:a>se
se:b>se

It also gave me this pattern:

a(a(a(a(ab)*b)*b)*b)*(b+b(a(a(a(a(ab)*b)*b)*b)*b)*($+a(a(a(a(ab)*b)*b)*b)*b))

Note that $ means the empty string there and + means alternation. It doesn’t use + to mean "one or more of the previous thing", so I wrote the short pattern at the top of this answer myself after seeing this.

Answered By: Kelly Bundy