Collecting Values from a String between two different characters including the last value
Question:
Example string:
x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\.\\DISPLAY4', is_primary=True
I want to get every value behind the "=" sign.
With
print(re.findall(r"=(.*?),", "x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\.\\DISPLAY4', is_primary=True"))
I get:
['0', '0', '1920', '1080', '531', '299', "'\\\\.\\DISPLAY4'"]
But I want the "True" from "is_primary" too
With
=(.*?)(,|$)
I can split the string in two groups and fetch the values from group1 with a for loop but
i think, there is a more beautiful way and i really want to see it
And is it maybe even possible to get the
"DISPLAY4"
out of:
"'\\\\.\\DISPLAY4'"
in the same expression?
Answers:
You can use re.findall and then exclude matching ,
or =
before and after the =
sign, using a single capture group.
If the values themselves can not contain '
you could use also exclude matching it:
[^=s,]+=[\.']*([^=s,']+)
import re
pattern = r"[^=s,]+=[\.']*([^=s,']+)"
s = "x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\\\\\.\\\\DISPLAY4', is_primary=True"
print(re.findall(pattern, s))
A bit more precise match with 2 capture groups:
[^=s,]+=(?:'(?:\+.\+)?([^s,=']+)'|([^s,=]+))
The pattern matches:
[^=s,]+=
Match 1+ chars other than a whitspace char ,
=
and then match =
(?:
Non capture group for the alternatives
'
Match the '
(?:\+.\+)?
Optionally match 1+ times /
, a dot .
and again 1+ times /
([^s,=']+)
Capture group 1, match 1+ chars other than a whitspace char ,
=
'
'
Match the '
|
Or
([^s,=]+)
Capture group 2, match 1+ chars other than a whitespace char ,
=
)
Close the non capture group
See a regex demo and a Python demo.
import re
pattern = r"[^=s,]+=(?:'(?:\+.\+)?([^s,=']+)'|([^s,=]+))"
s = "x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\\\\\.\\\\DISPLAY4', is_primary=True"
res = [m.group(1) if m.group(1) else m.group(2) for _, m in enumerate(re.finditer(pattern, s), start=1)]
print(res)
Both will output:
['0', '0', '1920', '1080', '531', '299', 'DISPLAY4', 'True']
Example string:
x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\.\\DISPLAY4', is_primary=True
I want to get every value behind the "=" sign.
With
print(re.findall(r"=(.*?),", "x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\.\\DISPLAY4', is_primary=True"))
I get:
['0', '0', '1920', '1080', '531', '299', "'\\\\.\\DISPLAY4'"]
But I want the "True" from "is_primary" too
With
=(.*?)(,|$)
I can split the string in two groups and fetch the values from group1 with a for loop but
i think, there is a more beautiful way and i really want to see it
And is it maybe even possible to get the
"DISPLAY4"
out of:
"'\\\\.\\DISPLAY4'"
in the same expression?
You can use re.findall and then exclude matching ,
or =
before and after the =
sign, using a single capture group.
If the values themselves can not contain '
you could use also exclude matching it:
[^=s,]+=[\.']*([^=s,']+)
import re
pattern = r"[^=s,]+=[\.']*([^=s,']+)"
s = "x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\\\\\.\\\\DISPLAY4', is_primary=True"
print(re.findall(pattern, s))
A bit more precise match with 2 capture groups:
[^=s,]+=(?:'(?:\+.\+)?([^s,=']+)'|([^s,=]+))
The pattern matches:
[^=s,]+=
Match 1+ chars other than a whitspace char,
=
and then match=
(?:
Non capture group for the alternatives'
Match the'
(?:\+.\+)?
Optionally match 1+ times/
, a dot.
and again 1+ times/
([^s,=']+)
Capture group 1, match 1+ chars other than a whitspace char,
=
'
'
Match the'
|
Or([^s,=]+)
Capture group 2, match 1+ chars other than a whitespace char,
=
)
Close the non capture group
See a regex demo and a Python demo.
import re
pattern = r"[^=s,]+=(?:'(?:\+.\+)?([^s,=']+)'|([^s,=]+))"
s = "x=0, y=0, width=1920, height=1080, width_mm=531, height_mm=299, name='\\\\\\\\.\\\\DISPLAY4', is_primary=True"
res = [m.group(1) if m.group(1) else m.group(2) for _, m in enumerate(re.finditer(pattern, s), start=1)]
print(res)
Both will output:
['0', '0', '1920', '1080', '531', '299', 'DISPLAY4', 'True']