Python RegEx for exact matches of brackets
Question:
I am trying to parse a string which is of the following format:
text = "some random string <inAngle> <anotherInAngle> [-option text] [-anotherOption ] [-option (Y|N)]"
I want to split the string in three parts.
- Just the "some random string"
- Everything that is ONLY in angle brackets. I.E inAngle and anotherInAngle above.
- Everything that is in square brackets.
If I use the RegEx
re.findall(r'[(.+?)]', text)
It gives everything I need within square brackets. If I use the same RegEx with angle brackets however,
re.findall(r'<(.+?)>', text)
It gives the text which is within angle bracket that are within square brackets too. So for example "text" from above which is within [-anotherOption]. I do not want that. The RegEx for angle bracket match should only return "inAngle" "anotherInAngle" from above.
What would be the RegEx for it?
Also how do I get only the first part i.e "some random string". This string can have 2 or 3 number of words
Answers:
You can simply disregard everything between square brackets before searching for things in angle brackets:
interm = re.sub(r'[(.*?)]', '', text)
re.findall(r'<(.+?)>', interm)
outputs
['inAngle', 'anotherInAngle']
then for matching the first part, match everything up to [
or <
. Granted this wont work if a string is allowed to randomly have either of these symbols unclosed embedded in the first part:
re.findall(r'([^<[]+)', text)[0]
outputs
some random string
Try if this regex would capture what you need
s*([^><[]]+b)|[([^]]*)]|<([^>]*)>
s*
preceded by optional whitespace
([^><[]]+b)
Group 1: Any non brackets until b (remove if undesired)
|[([^]]*)]
or Group 2: What’s inside square brackets
|<([^>]*)>
or Group 3: What’s inside angle brackets
See demo at regex101 (use “code generator” if needed)
<(.+?)>(?![^[]*])|[(.+?)]|((?!s+)[^[]<>]+)
You can simply use this re.findall
.See demo.
I am trying to parse a string which is of the following format:
text = "some random string <inAngle> <anotherInAngle> [-option text] [-anotherOption ] [-option (Y|N)]"
I want to split the string in three parts.
- Just the "some random string"
- Everything that is ONLY in angle brackets. I.E inAngle and anotherInAngle above.
- Everything that is in square brackets.
If I use the RegEx
re.findall(r'[(.+?)]', text)
It gives everything I need within square brackets. If I use the same RegEx with angle brackets however,
re.findall(r'<(.+?)>', text)
It gives the text which is within angle bracket that are within square brackets too. So for example "text" from above which is within [-anotherOption]. I do not want that. The RegEx for angle bracket match should only return "inAngle" "anotherInAngle" from above.
What would be the RegEx for it?
Also how do I get only the first part i.e "some random string". This string can have 2 or 3 number of words
You can simply disregard everything between square brackets before searching for things in angle brackets:
interm = re.sub(r'[(.*?)]', '', text)
re.findall(r'<(.+?)>', interm)
outputs
['inAngle', 'anotherInAngle']
then for matching the first part, match everything up to [
or <
. Granted this wont work if a string is allowed to randomly have either of these symbols unclosed embedded in the first part:
re.findall(r'([^<[]+)', text)[0]
outputs
some random string
Try if this regex would capture what you need
s*([^><[]]+b)|[([^]]*)]|<([^>]*)>
s*
preceded by optional whitespace([^><[]]+b)
Group 1: Any non brackets until b (remove if undesired)|[([^]]*)]
or Group 2: What’s inside square brackets|<([^>]*)>
or Group 3: What’s inside angle brackets
See demo at regex101 (use “code generator” if needed)
<(.+?)>(?![^[]*])|[(.+?)]|((?!s+)[^[]<>]+)
You can simply use this re.findall
.See demo.