Regex string between square brackets only if '.' is within string
Question:
I’m trying to detect the text between two square brackets in Python however I only want the result where there is a "." within it.
I currently have [(.*?] as my regex, using the following example:
String To Search:
CASE[Data Source].[Week] = ‘THIS WEEK’
Result:
Data Source, Week
However I need the whole string as [Data Source].[Week], (square brackets included, only if there is a ‘.’ in the middle of the string). There could also be multiple instances where it matches.
Answers:
You might write a pattern matching [...]
and then repeat 1 or more times a .
and again [...]
[[^][]*](?:.[[^][]*])+
Explanation
[[^][]*]
Match from [...]
using a negated character class
(?:
Non capture group to repeat as a whole part
.[[^][]*]
Match a dot and again [...]
)+
Close the non capture group and repeat 1+ times
See a regex demo.
To get multiple matches, you can use re.findall
import re
pattern = r"[[^][]*](?:.[[^][]*])+"
s = ("CASE[Data Source].[Week] = 'THIS WEEK'n"
"CASE[Data Source].[Week] = 'THIS WEEK'")
print(re.findall(pattern, s))
Output
['[Data Source].[Week]', '[Data Source].[Week]']
If you also want the values of between square brackets when there is not dot, you can use an alternation with lookaround assertions:
[[^][]*](?:.[[^][]*])+|(?<=[)[^][]*(?=])
Explanation
[[^][]*](?:.[[^][]*])+
The same as the previous pattern
|
Or
(?<=[)[^][]*(?=])
Match [...]
asserting [
to the left and ]
to the right
See another regex demo
I think an alternative approach could be:
import re
pattern = re.compile("([[^]]*].[[^]]*])")
print(pattern.findall(sss))
OUTPUT
['[Data Source].[Week]']
I’m trying to detect the text between two square brackets in Python however I only want the result where there is a "." within it.
I currently have [(.*?] as my regex, using the following example:
String To Search:
CASE[Data Source].[Week] = ‘THIS WEEK’
Result:
Data Source, Week
However I need the whole string as [Data Source].[Week], (square brackets included, only if there is a ‘.’ in the middle of the string). There could also be multiple instances where it matches.
You might write a pattern matching [...]
and then repeat 1 or more times a .
and again [...]
[[^][]*](?:.[[^][]*])+
Explanation
[[^][]*]
Match from[...]
using a negated character class(?:
Non capture group to repeat as a whole part.[[^][]*]
Match a dot and again[...]
)+
Close the non capture group and repeat 1+ times
See a regex demo.
To get multiple matches, you can use re.findall
import re
pattern = r"[[^][]*](?:.[[^][]*])+"
s = ("CASE[Data Source].[Week] = 'THIS WEEK'n"
"CASE[Data Source].[Week] = 'THIS WEEK'")
print(re.findall(pattern, s))
Output
['[Data Source].[Week]', '[Data Source].[Week]']
If you also want the values of between square brackets when there is not dot, you can use an alternation with lookaround assertions:
[[^][]*](?:.[[^][]*])+|(?<=[)[^][]*(?=])
Explanation
[[^][]*](?:.[[^][]*])+
The same as the previous pattern|
Or(?<=[)[^][]*(?=])
Match[...]
asserting[
to the left and]
to the right
See another regex demo
I think an alternative approach could be:
import re
pattern = re.compile("([[^]]*].[[^]]*])")
print(pattern.findall(sss))
OUTPUT
['[Data Source].[Week]']