How can I modify a python regular expression to check whether string elements exist in another string?

Question:

I know a similar question has been asked by me but wanted to modify it a bit to account for a new specific use case.

I have a string such as SIT,UAT call it a1, a2 where a1 and a2 can be any sequence of characters separated by a ,. There can also be any number of unique elements such as an a3 and a4. These a1 and a2 (up to aN) elements will only ever occur once in each a1, a2 combination.

I need a python regex that will allow me to check whether only (SIT and UAT) exist in a particular string separated by , if there is more than 1 element in the inputted list.

Scenarios:

Input 1: SIT,UAT

  1. SIT,UAT – should match with regex
  2. UAT,SIT – should match with regex
  3. SIT – should fail as both SIT and UAT not present together
  4. UAT – should fail as both SIT and UAT not present together
  5. TRA,SIT,UAT – should fail as only SIT and UAT must be present together with no other elements as TRA was not provided in the input list

Thanks in advance!

Asked By: supermariowhan

||

Answers:

The regular expression you probably want to use here is:

^(?:SIT,UAT|UAT,SIT)$

Sample Pandas code:

def valid(env1, env2):
    pat = r'^(?:' + env1 + r',' + env2 + r'|' + env2 + r',' + env1 + r')$'
    return df["col"].str.contains(pat, regex=True)

If you need to cater to more than two expected CSV values, then regex might not scale nicely. In that case, I would suggest splitting the input on comma and then using the base string functions:

inp = "TST,SIT,UAT,PROD"
vals = inp.split(",")
allowed = ["SIT", "UAT"]
output = all(p in allowed for p in vals)
print(output)  # False, because the input has TST and PROD
Answered By: Tim Biegeleisen

If I get your question right, this is an option to solve it. Unless your match values are fixed to be SIT,UAT or only a few known values, I’d rather suggest you to not got with regex and solve it by splitting the list.

def verify(input, match):
    matchList = list(sorted(match.split(',')))
    inputList = list(sorted(input.split(',')))
    return inputList == matchList

match = "SIT,UAT"
print(verify("SIT,UAT", match))     # true
print(verify("UAT,SIT", match))     # true
print(verify("SIT", match))         # false
print(verify("UAT", match))         # false
print(verify("TRA,SIT,UAT", match)) # false

The above assumes nothing about your match string. If you know that repeats don’t matter, you could use set comparison instead of list.

Answered By: Rodrigo Rodrigues

Use the aforementioned pattern:

pattern = ^(?=^(SIT,UAT)|(UAT,SIT)$)(?!(.*?,(?!SIT|UAT))+).*
  • (?=^(SIT,UAT)|(UAT,SIT)$) is used to check if string matches either SIT,UAT or UAT,SIT…
  • (?!(.*?,(?!SIT|UAT))+) is used to check that the string does not match any element besides SIT & UAT

Hope this helps.

Answered By: Paritosh Darekar