Regular express to find all lower case string with dot with Python

Question:

Trying with python to find all strings inside a double quote, and with domain name like format, such as "abc.def.ghi".

I am currently using re.findall('"([a-z\.]+[a-z]*)"', input_string),

[a-z\.]+ is for abc., def. and [a-z]* is for ghi.

So far it has no issue to match all string like "abc.def.ghi", but it also matches string that contains no ., such as "opq", "rst".

Question is, how to get rid of those string contains no dot . using regx?

Asked By: Frank

||

Answers:

[a-z\.]+

this part. matches any character a-z or .
if you want the dot to be there, you will have to move it outside the character set
something like

([a-z]+\.)+

result:
visualization

Answered By: cayonara

Pattern

'"([a-z]+(?:.[a-z]+)+)"'

Explanation

  • Start & end with a double quote
  • capture group
    • [a-z]+ one letter a-z
    • (?:…) nested non-capturomg subgroup of the capture group
      • period followed by at least one letter a-z (repeated at least once)
      • the nested subgroup is repeated at least once
      • make subgroup non-capturing since otherwise findall will only report this subgroup

Usage

pattern = re.compile(r'"[a-z]+(?:.[a-z]+)+"')
tests = ['"abc.def.ghi"', '"opq"']
for input_string in tests:
    print(f"input_string: {input_string}, findall:  {pattern.findall(input_string)}")

Output

input_string: "abc.def.ghi", found:  ['abc.def.ghi']
input_string: "opq", found:  []
Answered By: DarrylG
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.