Regular express to find all lower case string with dot with Python
Question:
Trying with python to find all strings inside a double quote, and with domain name like format, such as "abc.def.ghi"
.
I am currently using re.findall('"([a-z\.]+[a-z]*)"', input_string)
,
[a-z\.]+
is for abc.
, def.
and [a-z]*
is for ghi
.
So far it has no issue to match all string like "abc.def.ghi"
, but it also matches string that contains no .
, such as "opq"
, "rst"
.
Question is, how to get rid of those string contains no dot .
using regx?
Answers:
[a-z\.]+
this part. matches any character a-z or .
if you want the dot to be there, you will have to move it outside the character set
something like
([a-z]+\.)+
result:
visualization
Pattern
'"([a-z]+(?:.[a-z]+)+)"'
Explanation
- Start & end with a double quote
- capture group
- [a-z]+ one letter a-z
- (?:…) nested non-capturomg subgroup of the capture group
- period followed by at least one letter a-z (repeated at least once)
- the nested subgroup is repeated at least once
- make subgroup non-capturing since otherwise findall will only report this subgroup
Usage
pattern = re.compile(r'"[a-z]+(?:.[a-z]+)+"')
tests = ['"abc.def.ghi"', '"opq"']
for input_string in tests:
print(f"input_string: {input_string}, findall: {pattern.findall(input_string)}")
Output
input_string: "abc.def.ghi", found: ['abc.def.ghi']
input_string: "opq", found: []
Trying with python to find all strings inside a double quote, and with domain name like format, such as "abc.def.ghi"
.
I am currently using re.findall('"([a-z\.]+[a-z]*)"', input_string)
,
[a-z\.]+
is for abc.
, def.
and [a-z]*
is for ghi
.
So far it has no issue to match all string like "abc.def.ghi"
, but it also matches string that contains no .
, such as "opq"
, "rst"
.
Question is, how to get rid of those string contains no dot .
using regx?
[a-z\.]+
this part. matches any character a-z or .
if you want the dot to be there, you will have to move it outside the character set
something like
([a-z]+\.)+
result:
visualization
Pattern
'"([a-z]+(?:.[a-z]+)+)"'
Explanation
- Start & end with a double quote
- capture group
- [a-z]+ one letter a-z
- (?:…) nested non-capturomg subgroup of the capture group
- period followed by at least one letter a-z (repeated at least once)
- the nested subgroup is repeated at least once
- make subgroup non-capturing since otherwise findall will only report this subgroup
Usage
pattern = re.compile(r'"[a-z]+(?:.[a-z]+)+"')
tests = ['"abc.def.ghi"', '"opq"']
for input_string in tests:
print(f"input_string: {input_string}, findall: {pattern.findall(input_string)}")
Output
input_string: "abc.def.ghi", found: ['abc.def.ghi']
input_string: "opq", found: []