How can I modify a python regular expression to check whether string elements exist in another string?
Question:
I know a similar question has been asked by me but wanted to modify it a bit to account for a new specific use case.
I have a string such as SIT,UAT
call it a1, a2
where a1
and a2
can be any sequence of characters separated by a ,
. There can also be any number of unique elements such as an a3
and a4
. These a1
and a2
(up to aN
) elements will only ever occur once in each a1, a2
combination.
I need a python regex that will allow me to check whether only (SIT
and UAT
) exist in a particular string separated by ,
if there is more than 1 element in the inputted list.
Scenarios:
Input 1: SIT,UAT
SIT,UAT
– should match with regex
UAT,SIT
– should match with regex
SIT
– should fail as both SIT and UAT not present together
UAT
– should fail as both SIT and UAT not present together
TRA,SIT,UAT
– should fail as only SIT and UAT must be present together with no other elements as TRA was not provided in the input list
Thanks in advance!
Answers:
The regular expression you probably want to use here is:
^(?:SIT,UAT|UAT,SIT)$
Sample Pandas code:
def valid(env1, env2):
pat = r'^(?:' + env1 + r',' + env2 + r'|' + env2 + r',' + env1 + r')$'
return df["col"].str.contains(pat, regex=True)
If you need to cater to more than two expected CSV values, then regex might not scale nicely. In that case, I would suggest splitting the input on comma and then using the base string functions:
inp = "TST,SIT,UAT,PROD"
vals = inp.split(",")
allowed = ["SIT", "UAT"]
output = all(p in allowed for p in vals)
print(output) # False, because the input has TST and PROD
If I get your question right, this is an option to solve it. Unless your match values are fixed to be SIT,UAT
or only a few known values, I’d rather suggest you to not got with regex and solve it by splitting the list.
def verify(input, match):
matchList = list(sorted(match.split(',')))
inputList = list(sorted(input.split(',')))
return inputList == matchList
match = "SIT,UAT"
print(verify("SIT,UAT", match)) # true
print(verify("UAT,SIT", match)) # true
print(verify("SIT", match)) # false
print(verify("UAT", match)) # false
print(verify("TRA,SIT,UAT", match)) # false
The above assumes nothing about your match
string. If you know that repeats don’t matter, you could use set
comparison instead of list
.
Use the aforementioned pattern:
pattern = ^(?=^(SIT,UAT)|(UAT,SIT)$)(?!(.*?,(?!SIT|UAT))+).*
(?=^(SIT,UAT)|(UAT,SIT)$)
is used to check if string matches either SIT,UAT or UAT,SIT…
(?!(.*?,(?!SIT|UAT))+)
is used to check that the string does not match any element besides SIT & UAT
Hope this helps.
I know a similar question has been asked by me but wanted to modify it a bit to account for a new specific use case.
I have a string such as SIT,UAT
call it a1, a2
where a1
and a2
can be any sequence of characters separated by a ,
. There can also be any number of unique elements such as an a3
and a4
. These a1
and a2
(up to aN
) elements will only ever occur once in each a1, a2
combination.
I need a python regex that will allow me to check whether only (SIT
and UAT
) exist in a particular string separated by ,
if there is more than 1 element in the inputted list.
Scenarios:
Input 1: SIT,UAT
SIT,UAT
– should match with regexUAT,SIT
– should match with regexSIT
– should fail as both SIT and UAT not present togetherUAT
– should fail as both SIT and UAT not present togetherTRA,SIT,UAT
– should fail as only SIT and UAT must be present together with no other elements as TRA was not provided in the input list
Thanks in advance!
The regular expression you probably want to use here is:
^(?:SIT,UAT|UAT,SIT)$
Sample Pandas code:
def valid(env1, env2):
pat = r'^(?:' + env1 + r',' + env2 + r'|' + env2 + r',' + env1 + r')$'
return df["col"].str.contains(pat, regex=True)
If you need to cater to more than two expected CSV values, then regex might not scale nicely. In that case, I would suggest splitting the input on comma and then using the base string functions:
inp = "TST,SIT,UAT,PROD"
vals = inp.split(",")
allowed = ["SIT", "UAT"]
output = all(p in allowed for p in vals)
print(output) # False, because the input has TST and PROD
If I get your question right, this is an option to solve it. Unless your match values are fixed to be SIT,UAT
or only a few known values, I’d rather suggest you to not got with regex and solve it by splitting the list.
def verify(input, match):
matchList = list(sorted(match.split(',')))
inputList = list(sorted(input.split(',')))
return inputList == matchList
match = "SIT,UAT"
print(verify("SIT,UAT", match)) # true
print(verify("UAT,SIT", match)) # true
print(verify("SIT", match)) # false
print(verify("UAT", match)) # false
print(verify("TRA,SIT,UAT", match)) # false
The above assumes nothing about your match
string. If you know that repeats don’t matter, you could use set
comparison instead of list
.
Use the aforementioned pattern:
pattern = ^(?=^(SIT,UAT)|(UAT,SIT)$)(?!(.*?,(?!SIT|UAT))+).*
(?=^(SIT,UAT)|(UAT,SIT)$)
is used to check if string matches either SIT,UAT or UAT,SIT…(?!(.*?,(?!SIT|UAT))+)
is used to check that the string does not match any element besides SIT & UAT
Hope this helps.