String template to regex in python
Question:
I am trying to make a parser of file paths that follow an arbitrary naming convention.
I am wanting to take an arbitrary string.Template and get the re.Match.groupdict() corresponding to a string that follows the string.Template. I would also like to specify the delimiter of the string template such as $Name1-$Name2 or $Name1_$Name2 both returning {‘Name1’: ‘foo’, ‘Name2’: ‘bar’} for the respective inputs.
It would also be nice to be able to have a syntax for specifying a non-capture group such as $[ignore]_$keep resulting in {‘keep’: ‘foo’}.
Answers:
Found a package called lucidity that does exactly what I need.
Here is an example of a function that implements the functionality you described:
import re
def parse_file_path(path, template, delimiter="_"):
pattern = template.replace("$", "$").replace("[", "[").replace("]", "]").replace(".", ".")
pattern = pattern.replace(delimiter, ".*")
pattern = re.sub(r"$w+", "(?P<\0>[^" + delimiter + "]+)", pattern)
match = re.match(pattern, path)
if match:
return {k: v for k, v in match.groupdict().items() if not k.startswith("_")}
else:
return None
You can use this function as follows:
path = "foo_bar"
template = "$Name1_$Name2"
result = parse_file_path(path, template)
print(result) # Output: {'Name1': 'foo', 'Name2': 'bar'}
path = "foo-bar"
template = "$Name1-$Name2"
result = parse_file_path(path, template, delimiter="-")
print(result) # Output: {'Name1': 'foo', 'Name2': 'bar'}
path = "foo_bar"
template = "$[ignore]_$keep"
result = parse_file_path(path, template)
print(result) # Output: {'keep': 'bar'}
I am trying to make a parser of file paths that follow an arbitrary naming convention.
I am wanting to take an arbitrary string.Template and get the re.Match.groupdict() corresponding to a string that follows the string.Template. I would also like to specify the delimiter of the string template such as $Name1-$Name2 or $Name1_$Name2 both returning {‘Name1’: ‘foo’, ‘Name2’: ‘bar’} for the respective inputs.
It would also be nice to be able to have a syntax for specifying a non-capture group such as $[ignore]_$keep resulting in {‘keep’: ‘foo’}.
Found a package called lucidity that does exactly what I need.
Here is an example of a function that implements the functionality you described:
import re
def parse_file_path(path, template, delimiter="_"):
pattern = template.replace("$", "$").replace("[", "[").replace("]", "]").replace(".", ".")
pattern = pattern.replace(delimiter, ".*")
pattern = re.sub(r"$w+", "(?P<\0>[^" + delimiter + "]+)", pattern)
match = re.match(pattern, path)
if match:
return {k: v for k, v in match.groupdict().items() if not k.startswith("_")}
else:
return None
You can use this function as follows:
path = "foo_bar"
template = "$Name1_$Name2"
result = parse_file_path(path, template)
print(result) # Output: {'Name1': 'foo', 'Name2': 'bar'}
path = "foo-bar"
template = "$Name1-$Name2"
result = parse_file_path(path, template, delimiter="-")
print(result) # Output: {'Name1': 'foo', 'Name2': 'bar'}
path = "foo_bar"
template = "$[ignore]_$keep"
result = parse_file_path(path, template)
print(result) # Output: {'keep': 'bar'}